In 1983, Altman and Bland1 described a technique for comparing two different methods of making a clinical measurement. In a follow-up article, Bland and Altman2 described the method with further detail, also describing its applications for analyzing repeatability data. The method is now very widely used in medicine and eye research. The 1986 article has since been cited more than 23,000 times, including a good representation of clinical ophthalmic literature. For example, three ophthalmic journals are represented in the top 25 journals for citing the article: Investigative Ophthalmology and Visual Science (ranked ninth, with more than 180 citing articles), Optometry and Vision Science (ranked 13th, with more than 160 citing articles), and Ophthalmic and Physiological Optics (ranked 24th, with more than 130 citing articles).3
The method will be discussed in more depth below. Briefly, Bland-Altman plots take the following format. On each member of a group of participants, the measurements are made using each method. For each pair of measurements, the difference is calculated and this difference is then plotted against the average for each pair of measurements. It is common practice to show the mean of the differences on the plot as a reference line. It is also common practice to show 95% limits of agreement (LoAs). Limits of agreement provide researchers with a way of assessing the range of variability between the two measurements. They can be evaluated or compared with predetermined tolerances to decide whether given techniques have clinically acceptable agreement or repeatability.4 Limits of agreement are calculated as the mean of differences ±1.96 SDs of the differences. In their 1986 article,2 Bland and Altman stated that “if the data are normally distributed, then 95% of differences will lie between these limits.” Armstrong et al.5 state that the LoAs represent the range “in which it would be expected that 95% of the differences between the two methods would fall.” It is likely that most authors and readers think about the LoAs in this fashion, but neither of these explanations is strictly true. The LoAs on a Bland-Altman plot are only sample estimates of what the LoAs in a population might be. The LoAs in a sample, particularly for small sample sizes, may vary considerably from the population LoAs. Moreover, for samples smaller than infinity, the sample estimates of an LoA will be biased estimates of the population LoA. On average, they will tend to be slightly closer to the mean of differences than the population LoA would be. So, if authors are using LoAs as an estimate of the range over which one would expect 95% of differences to lie, or if readers are likely to interpret LoAs in this way, then it is useful to have a method of estimating how reliable the sample LoAs are. One way of doing this is to estimate confidence intervals for the LoAs. Confidence intervals describe the range over which a given parameter (in this case an LoA) is likely to lie in a population with a given probability (usually 95%). In their 1986 article,2 Bland and Altman described an approximate method for calculating such 95% confidence limits for LoAs and described the derivation of the method in a 1999 article.6 McAlinden et al.,7 in an article that is one of the author statistical guidelines for Ophthalmic and Physiological Optics,8 recommended Bland and Altman’s method with the following strong terms: “As the limits of agreement are only estimates, confidence intervals should be calculated and reported”; and in a subsequent letter further justified the uses of confidence intervals for LoAs.9 Other authors have also highlighted the value of using confidence intervals in interpreting the practical significance of LoAs,4 one arguing that “limits of agreement should never be presented or interpreted without confidence intervals and that the inclusion of confidence intervals should become standard practice in the literature.”10
Despite the apparent value of confidence intervals for LoAs, they are not widely reported. A brief search of Optometry and Vision Science showed one article11 that reports confidence limits for Bland-Altman LoAs using the Bland and Altman approximate method, but the vast majority of articles using Bland-Altman methods do not, including an article to which the current author contributed.12 This is not unique to optometric literature; for example, in a review of 42 method comparison articles that used Bland-Altman analysis in anesthesia journals, only two articles reported confidence intervals for LoAs.4
Nevertheless, especially for small sample sizes, confidence intervals for LoAs may be a useful metric for researchers to consider. However, as will be seen, for such small sample sizes, the Bland and Altman approximate method does not accurately describe confidence intervals for LoAs. It is one of a number of parametric methods in the literature for calculating confidence intervals for LoAs, all of which are approximations2,6,7,13–16 and most of which are based on assumptions that a large sample size is used for estimating LoAs.2,6,7,14,15 Also, most of these methods for calculating confidence intervals for LoAs are based on an implicit assumption that one is only considering only one LoA in a pair,2,6,7,14–16 a process known as determining one-sided tolerance factors. For the main application of Bland-Altman analysis, authors and readers are interested in considering LoAs as a pair and, in the likelihood that 95% of the population lies between the LoAs, a problem that can be considered as determining two-sided tolerance intervals. To date, only one author has specifically provided an approach to determining confidence intervals for LoAs considered as a pair using an approximate method.13 However, given the standard Bland-Altman assumption of normally distributed data, and without assuming large sample sizes, it is possible to calculate exact confidence intervals for LoAs considered individually and as a pair using techniques described in the literature on estimating one- and two-sided tolerance factors for a normal distribution.17–20 To date, in the literature, there has been no description of how to apply these methods of calculating exact confidence intervals for Bland-Altman LoAs.
Therefore, the purpose of this article is to describe, in a way that will be useful for clinical scientists, how to calculate exact confidence intervals for Bland-Altman LoAs considered either individually or as a pair based solely on the standard Bland-Altman assumption of normally distributed data. Such methods will be most useful for small sample sizes but can be applied appropriately for any sample size. As an example for readers, the different methods will be applied to Bland and Altman’s 1986 data set2 and contrasted with Bland and Altman’s approximate method.6 Tables will be provided that simplify the calculations of confidence intervals for LoAs. Finally, examples from ophthalmic clinical sciences literature will be used to show the value of calculating confidence intervals for LoAs based on moderate and small sample sizes.
A Brief Background on Bland-Altman Plots
For readers who are unfamiliar with the method, this section contains a brief description of Bland-Altman plots, as background. Fig. 1 shows the principles of the method using the data originally provided and analyzed by Bland and Altman.2 The data are peak expiratory flow rate measurements made with a Wright peak flow meter and also with a Mini Wright peak flow meter on 17 patients (n = 17). The typical Bland-Altman analysis shown in Fig. 1 has the difference d between the two meters (Wright–Mini Wright) plotted on the y axis and the mean xave of the two measurements plotted on the x axis. A horizontal reference line showing the mean of the differences (in this case
= −2.1 L/min) is typically included in Bland-Altman plots. For the data in Fig. 1, the SD of the differences (sdiff) was 38.8 L/min.
Bland-Altman plots usually also include horizontal lines to denote 95% LoAs. If the data were distributed normally and sample sizes were very large, one would expect approximately 95% of the differences (d) in the sample to lie within 1.96 SD (i.e., 1.96 sdiff) from the mean of differences
. Thus, the upper LoA is given by:
which in this example = −2.1 + 1.96 × 38.8 = 73.9 L/min. The lower LoA is given by:
which in this example = −2.1 + 1.96 × 38.8 = −78.1 L/min.
These LoAs, as shown in Fig. 1, are actually slightly different from those shown in Bland and Altman2 because they used the slightly more conservative and easier to calculate LoAs of
± 2 sdiff. Most authors currently use the more precise definition:
± 1.96 sdiff.
The mean of differences
and LoAs are statistics that describe characteristics of the data sample itself. However, for many cases, researchers and readers are concerned with how these statistics describe the population from which the sample is drawn.21 This can be done by using the information in the sample to estimate confidence intervals for a given parameter. Confidence intervals describe the range over which a parameter is likely to lie with a given probability, often 95%.
It is very common practice to calculate 95% confidence intervals for a mean such as
. This is accomplished, assuming a normal distribution of data, using the standard error of the mean and the t distribution using the equation:
For the data in Fig. 1, the critical t value for 16 degrees of freedom is 2.12. This gives a confidence interval from −22.0 to 17.8 L/min, indicated by error bars on the
line in Fig. 1. In other words, it can be said that there is a 95% probability that the population mean μd for
values lies between −22.0 and 17.8 L/min.
It is also possible to calculate 95% confidence intervals for LoAs using the assumption that the data are distributed normally. Bland and Altman2,6 described an approximate method for estimating such confidence limits, and an example is shown as error bars on the LoAs in Fig. 1. This approximation is based on the assumption that underlying differences d are distributed normally and that the variance of the difference is independent of the difference itself and on the assumption that the sample size is large. Previous researchers2,6 using these assumptions have approximated the standard error for an individual LoA as
or more precisely as
. The approximation assumes that the probability density function for σdiff is normally distributed and that the probability density function for
+ 1.96 σdiff is distributed as a t distribution. These assumptions are approximately true for large values of n. The equations for confidence intervals by this approximate method from Bland and Altman6 are:
In Fig. 1, these give approximate 95% confidence intervals, symmetrically distributed around the LoAs, ranging from 39.8 to 108.0 L/min for the upper LoA and, for the lower LoA, between −44.0 and −112.2 L/min.
Exact Confidence Intervals for Upper or Lower LoAs Considered Individually
The Bland and Altman approximate method is derived using an implicit assumption that one is calculating a confidence interval for either the upper LoA or the lower LoA but not both at the same time (i.e., LoAs considered individually). Owen20 described the problem (for the upper LoA) as: Given a sample estimate of the mean (in this case
) and a sample estimate of the SD (in this case sdiff), what value of k fills the criterion that a minimum proportion, P, of the population is less than
+ ksdiff with a confidence γ? For Bland-Altman 95% LoAs, p = 0.975 (corresponding to a Z score of 1.96) and, for two-tailed 95% confidence limits, k values are determined for γ = 0.025 and γ = 0.975. The same k values could also be used to determine 95% confidence limits for the lower LoA as
− ksdiff. Even though this is typically determining two-tailed confidence limits, formally, this process is known as determining “one-sided tolerance factors for a normal distribution”20 because only one LoA is being considered at a time (either the upper or the lower LoA). For Bland and Altman approximation:
As previously stated, the Bland and Altman method may be inaccurate if n is not large. In fact, there are closer but numerically more complicated approximations than Bland and Altman’s that can be used for determining confidence limits for LoAs considered individually (e.g., the Method of Variance Estimates Recovery [MOVER]).14,15
However, such approximations may be unnecessary because the exact 95% confidence intervals for LoAs (considered individually) can be exactly estimated for any sample size, large or small, assuming a normal distribution of the population from which d is drawn. Using the equations of Owen,20k can be determined for a value of p = 0.975 using a non–central t distribution. For different values of γ, appropriate critical values are equal to k where:
As an aid for researchers, coefficients c0.025 and c0.975 have been calculated as k values corresponding to γ values of 0.025 and 0.975. These coefficients are included in Table 1 for different degrees of freedom υ = n − 1. Table 1 also includes c0.05 and c0.95, which could be used to compute two-tailed 90% confidence limits or one-tailed 95% confidence limits, and also c0.5, which may be used for determining median values for the confidence interval. The values have been computed to finer than the 10th decimal place using MATLAB and rounded to four decimal places in the table. Table 1 is abridged, but a more complete set of tables is available as a supplementary file (Appendix Table 1, available at http://links.lww.com/OPX/A197). Similar tables have been published to three decimal places by Odeh and Owen18 and in part, with small inaccuracies, by Owen.20 As a further aid to researchers, a supplementary file contains a MATLAB code for calculating coefficients found in Table 1 (see MATLAB Kcalculator code, available at http://links.lww.com/OPX/A199).
The exact and approximate confidence intervals differ for small subject numbers. To show this, the probability density function for the exact confidence interval for LoAs (considered individually) is shown in Fig. 2B, along with the probability density function for the approximate method (Fig. 2A), assuming n = 5 (or degrees of freedom υ = 4) and given
= 0 and sdiff = 1. The 1.96 line in Fig. 2 shows the LoA.
By way of example, exact confidence intervals for 95% LoAs, considered individually, are shown in Fig. 1, calculated as follows.
For the upper LoA, exact 95% confidence intervals are given by
+ c0.025sdiff and
For the lower LoA, exact 95% confidence intervals are given by
− c0.025sdiff and
From Table 1, for 16 degrees of freedom, c0.025 = 1.3150 and c0.975 = 3.1483. So, in Fig. 1, for the upper LoA, 95% confidence intervals are bounded by −2.1 + 1.3150 × 38.8 = 48.9 L/min and −2.1 + 3.1483 × 38.8 = 120.0 L/min. For the lower LoA, 95% intervals are bounded by −2.1 − 1.3150 × 38.8 = −53.1 L/min and −2.1 -3.1483 × 38.8 = −124.2 L/min.
Figs. 1 and 2 illustrate a number of general differences between the two methods for calculating LoA confidence intervals (considered individually). First, exact confidence limits will be asymmetric about the LoAs, in contrast to the symmetry of the Bland and Altman approximate method. The inner bound for the exact 95% confidence interval will always be closer to the LoA (and smaller than the approximate inner confidence interval) than the outer bound (which will always be larger than the corresponding approximate confidence interval). However, this asymmetry becomes smaller as sample size increases. As a guide, for LoAs considered individually, the difference between exact and approximate 95% inner confidence limits becomes less than 0.1 sdiff when degrees of freedom become greater than 37. The difference between exact and approximate 95% outer confidence limits becomes less than 0.1 sdiff when degrees of freedom become greater than 44.
In addition, the total magnitude of the approximate 95% confidence interval (the difference between inner and outer bounds) is always slightly smaller than the total magnitude of the exact confidence interval. This difference in total magnitude becomes smaller as n increases, being less than 0.1sdiff for degrees of freedom greater than 13.
Exact Confidence Intervals for Upper and Lower LoAs Considered as a Pair
Bland-Altman LoAs are usually considered as a pair of bounds, and so it may be more appropriate for many applications to treat them as a pair when calculating their confidence limits. However, the methods demonstrated so far in this article only estimate confidence intervals for LoAs considered individually. However, there are methods in which the confidence intervals are obtained for the LoAs considered as a pair. These arise from the literature on what is formally known as “two-sided tolerance factors for a normal distribution.” The problem can be stated as: given a sample estimate of the mean (in this case
) and a sample estimate of the SD (in this case sdiff), what value of kt fills the criterion that a minimum proportion, P, of the population lies between
± ktsdiff with confidence γ.20
Ludbrook13 was the first to apply this two-sided tolerance factors approach to the problem of calculating confidence limits for Bland-Altman LoAs. He used existing tables22 to provide an upper 95% confidence limit (i.e., a one-tailed bound) for LoAs considered as a pair. These tables contained approximate coefficients, being reproductions of tables from Bowker23 published in 1947. The coefficients in Bowker’s tables were calculated using approximate methods described by Wald and Wolfowitz.19 However, for values of γ, the coefficients kt can be calculated exactly by determining the value of kt that satisfies the equation:
And r is the root of the equation.
where φ(t) is the standard normal probability density function.17,24
Table 2 contains coefficients calculated by the author with MATLAB using these equations and the method described by Odeh.17 Degrees of freedom have been defined as υ = n − 1. Values for kt, rounded to four decimal places, were calculated for P = 0.95 by iterative methods for γ values accurate to within eight decimal places. Coefficients
are the kt values corresponding to γ values of 0.025, 0.05, 0.50, 0.95, and 0.975. Similar tables have previously been published for exact
to three decimal places17,18 and also for approximate19
values to three decimal places.23Table 2 is abridged, but a more complete set of tables is available as a supplementary file (Appendix Table 2, available at http://links.lww.com/OPX/A198). As a further aid to researchers, a supplementary file contains a MATLAB code for calculating coefficients found in Table 2 (see MATLAB Ktcalculator code, available at http://links.lww.com/OPX/A200).
The above equations have also been used to derive Fig. 2C, which shows the probability density functions for LoAs considered as a pair given
= 0 and sdiff = 1, assuming n = 5 (or degrees of freedom υ = 4).
This approach of calculating confidence limits for LoAs considered as a pair is probably most suitable, in general, for determining confidence intervals for Bland-Altman LoAs. Confidence limits can be given by
Using the data in Fig. 1 as an example, with ν = 16, one can calculate a 2.5% lower bound using the
coefficient 1.4900 to determine there is a 2.5% probability that at least 95% of the differences in the population lie between the bounds of −2.1 ± 1.4900 × 38.8 L/min, that is, −2.1 ± 57.8 L/min. One could also calculate a 97.5% upper bound using the
coefficient 3.0824 to determine there is a probability of 97.5% that at least 95% of the differences in the population lie between the bounds of −2.1 ± 3.0824 × 38.8 L/min, that is, −2.1 ± 119.6 L/min. Thus, there is a 95% probability that at least 95% of the differences in the population lie outside the limits of −2.1 ± 57.8 L/min (−59.9 and 55.7 L/min) and inside the limits of −2.1 ± 119.6 L/min (−121.7 and 117.5 L/min). These confidence limits have been included as a pair of error bars on the LoAs in Fig. 1.
Figs. 1 and 2 show subtle differences between exact confidence intervals for LoAs considered individually and as a pair. These differences are also apparent when comparing Tables 1 and 2.
The exact confidence intervals for LoAs, considered as a pair, will also differ from those obtained by the Bland and Altman approximate method, but the difference depends on the sample size. The sample size at which this difference is acceptable, and at which the methods become interchangeable, will be a matter of judgment for researchers. As a guide, however, for LoAs considered as a pair, the difference between the exact 95% inner confidence limits and the Bland and Altman approximation becomes less than 0.1 sdiff when degrees of freedom become greater than 112. The difference between the exact and the approximate 95% outer confidence limits becomes less than 0.1 sdiff when degrees of freedom become greater than 26. In addition, the total magnitude of the approximate 95% confidence interval (the difference between inner and outer bounds) is smaller than the total magnitude of the exact confidence intervals for degrees of freedom less than 8 and is larger for all other degrees of freedom. This absolute difference in total magnitude is less than 0.1 sdiff for degrees of freedom of 7, 8, and 9 and for degrees of freedom greater than 150.
It is good practice to calculate confidence intervals for Bland-Altman LoAs, especially if the sample size is small. The LoA calculated for a sample is only an estimate of the LoA for the population from which the sample is drawn, and it is useful for researchers and readers to have an understanding of how much the LoAs in the population may vary from the sample LoAs.
This article has described a number of approaches to calculating confidence intervals for Bland-Altman LoAs, and the method chosen by researchers will depend, to some extent, on what aspects of the data the researchers wish to highlight.
Occasionally, researchers might adopt an approach of determining confidence intervals for LoAs considered individually where they are especially concerned with inferences about either the top or the bottom of the range of differences. An example might be comparing a new technique of tonometry with a preexisting standard such as Goldmann tonometry and considering the range over which the new technique underestimates Goldmann pressures (and as a consequence, there may be a chance of missing elevated intraocular pressure). In such situations, the coefficients contained in Table 1 may be useful.
However, for most applications, researchers will be concerned about both upper and lower LoAs and, under these circumstances, it is more appropriate to calculate the confidence limits for LoAs considered as a pair. This approach may fit best with the underlying principles of Bland-Altman analysis that generally involves determining a pair of LoAs. To illustrate where such calculations are useful, confidence intervals have been calculated for Bland-Altman LoAs for some previously published data sets.
The first comes from a recent study by Bandlitz et al.25 of techniques for measuring tear meniscus radius with a newly developed technique: PDM and with an existing technique VM. Fig. 3 shows a Bland-Altman plot comparing in vitro radius measurements taken from five glass capillary tubes using the different instruments. The figure is as originally published, with the addition of confidence intervals calculated using coefficients from Table 2, plotted as error bars on the LoAs. The sample size (n = 5) is relatively small to base the estimate of LoAs on, and calculating confidence intervals for LoAs may be useful under such circumstances. The mean of differences was 0.0002 mm, with an estimated SDdiff of 0.0205 mm. The LoAs are shown to be 0.0404 mm (confidence interval, +0.0256 to 0.1264 mm) and −0.0400 mm (confidence interval, −0.0252 to −0.1260 mm). One could say, based on this sample, with 95% confidence, that, in the population, the LoAs could have been as wide apart as +0.1264 and −0.1260 mm or as close together as +0.0256 and −0.0252 mm. The authors made no comment interpreting the LoAs beyond presenting the data but, if they had included the confidence intervals, readers would have seen that LoAs in the population are reasonably likely to lie considerably farther from the mean of differences. A range of ±0.126 mm may still be acceptable LoAs in the science of meniscometry, but it is useful for readers to know the range when interpreting the research.
Fig. 4 shows Bland-Altman plots taken from a recent study of retinal oxygen saturation measurements obtained from specialized fundus imaging in 18 subjects.26 The figure shows intrasession variability in the measurements from retinal arteries in frame A and retinal veins in frame B. The error bars were not in the original figure but have been included to show the confidence limits for LoAs using the exact method and Table 2. Means of differences were 0.3% (with SDdiff of 5.0%) for arteries and −0.6% (with SDdiff of 8.0%) for veins. The LoAs (with 95% confidence intervals) were 10.1% (7.8% to 15.5%) and −9.6% (−7.2% to −14.9%) for arteries and 15.0% (11.4% to 23.6%) and −16.2% (−12.6% to −24.8%) for veins.
In the original article, the discussion section makes the observation that: “Furthermore, our analysis showed for the first time that there was no bias between (or within) recording sessions and that 95% LoAs are generally lower in arteries.” There is not, currently, an appropriate inferential statistic to determine the specific question of whether the LoAs differ from one data set to another (although a number of tests have been developed to assess the simpler question of whether variance is different between two samples27), but some guidance may be obtained from the confidence intervals. On the basis of the presented data, a claim that LoAs are different (for arteries and veins) is difficult to sustain because of the overlap between the confidence intervals.
This is in contrast to the next example, a pediatric ocular biometry report,12 in which the repeatability of ultrasound measurements (Echoscan) of axial lengths in children were compared with partial coherence interferometry measurements (IOLmaster) of axial length. This is shown as repeatability Bland-Altman plots in Fig. 5, as originally published, except for the addition of confidence intervals (again calculated from Table 2) for LoAs drawn as error bars. For 37 subjects, two measurements were repeated with each instrument. The IOLmaster tends to give quite repeatable results with a lower LoA of −0.05 mm (with a CI of between −0.04 mm and −0.06 mm) and an upper LoA of +0.04 mm (with a CI of between 0.03 mm and 0.05 mm). Echoscan gives less repeatable results. The mean of differences was −0.094 mm and the SDdiff was 0.388 mm. The lower LoA was −0.85 mm (with a CI of between −0.71 mm and −1.09 mm). The upper LoA was 0.67 mm (with a CI of between 0.53 mm and 0.91 mm). In this case, the addition of confidence intervals supports the initial article’s inference that the IOLmaster had better repeatability than Echoscan measurements of axial length because the outer 95% confidence intervals for the IOLmaster LoAs lie well inside the inner 95% confidence intervals for Echoscan LoAs. In addition, the confidence intervals may help readers in planning future research or making comparisons with other studies because they provide estimates for where the LoAs might be expected to lie in the population.
This article has described exact parametric methods for determining confidence intervals for LoAs. As an application to Bland-Altman analysis, the methods have only been partially described before using approximate coefficients,13 although the numerical techniques are well established.17,18,20 This exact parametric method may be best, but even an approximate calculation of confidence intervals for LoAs is likely to be better than none at all.
Researchers may find the new technique useful, especially for small sample sizes. Whether they decide to use confidence limits for LoAs either considered as a pair or considered individually, or exact or approximate methods, authors should always describe the method used for their calculations. In addition, authors should recognize that readers may wish to consider alternative inferences from their data. By reporting
, sdiff, and n values, authors will give readers the information that can be used for alternative calculations of confidence intervals for LoAs.
Finally, authors should consider whether it is useful to include a set of error bars in their Bland-Altman plots to show confidence intervals for LoAs as in Figs. 1, 3, 4, and 5 or merely to report the confidence intervals as values. Although they are a convenient way to show the confidence intervals, error bars themselves can be interpreted in a number of different ways, for example, representing SDs or standard errors. Authors should take care to explain in figure legends what the error bars represent and the method used for calculating them.
School of Optometry and Vision Science
QUT, GPO Box 2434
Brisbane Qld 4001
Received: June 5, 2014; accepted December 12, 2014.
APPENDIX/SUPPLEMENTAL DIGITAL CONTENT
Appendix Table 1, listing coefficients for calculating exact 95% confidence intervals for 95% LoAs considered individually, for different degrees of freedom υ = n − 1, is available at http://links.lww.com/OPX/A197. Appendix Table 2, listing coefficients for calculating exact 95% confidence intervals for 95% LoAs considered as a pair, for different degrees of freedom υ = n − 1, is available at http://links.lww.com/OPX/A198. Supplemental Digital Content 1, the MATLAB Kcalculator code, for calculating coefficient values in Table 1, is available at http://links.lww.com/OPX/A199. Supplemental Digital Content 2, the MATLAB Ktcalculator code, for calculating coefficient values in Table 2 and Appendix Table 2, is available at http://links.lww.com/OPX/A200.