Click on the links below to access all the ArticlePlus for this article.
Please note that ArticlePlus files may launch a viewer application outside of your web browser.
The current emphasis on early detection and prevention of chronic and acute diseases has led to development of new and sophisticated biomarkers. However, costs of evaluation of their effectiveness can sometimes limit the feasibility of such testing. For example, the interleukin-6 biomarker of inflammation has been suggested to have potential discriminatory ability for myocardial infarction. However, the cost of a single assay can be so high (20 or more times greater than storage and technician costs) that financial considerations will hinder attempts to evaluate the usefulness of the biomarker. Analysis of results based on smaller numbers of pooled specimens has been shown to be a useful cost-cutting technique, especially with microarray experiments.1–5
By pooling (ie, physically combining individual specimens), the amount of information per assay is increased while the number of assays needed to evaluate this information decreases.4,6 We assumed that measurement of the samples being pooled adequately represents the average of the individual unpooled sample. Conveniently, this is often the case, as most tests are expressed per unit of volume.
Receiver operating characteristics (ROC) curves are used in biomedical research to evaluate the effectiveness of biomarkers for distinguishing individuals with disease from those without.7–10 The Youden index (J), a function of sensitivity (q) and specificity (p), is a commonly used measure of overall diagnostic effectiveness.7–13 This index ranges between 0 and 1, with values close to 1 indicating that the biomarker's effectiveness is relatively large and values close to 0 indicating limited effectiveness. Figure 1 shows that J is the maximum vertical distance or difference between the ROC curve and the diagonal or chance line. J is defined by
over all cut-points c, −∞ < c < ∞. If risk of disease is a monotonically increasing function of the marker level, sensitivity decreases and specificity increases with rising c. Thus, there is a penalty, decreased specificity for increasing sensitivity too far. J occurs at the optimal cut-point for calling a patient diseased, maximizing the number of correctly classified individuals.11–14 On the other hand, the consequences of a positive or negative test result (ie, intervention) may be quite different and the loss from missing a case may be greater than from overcalling a control. Then, a differential weighting is needed to optimize J (Appendix A.1, available with the electronic version of the article).
With equal weight given to errors of sensitivity and specificity, J can be determined graphically by plotting fx and fy, the probability density functions of the cases and controls, respectively, for a continuously distributed biomarker (Fig. 2). J is the difference of the area under fx and fy to the right of the cut-point, with negative area when fy > fx. This area is identical to the difference of the area under fy and fx to the left of the cut-point. With unequal weights, in a ratio R, J can be seen as the difference between fy and Rfx.
For diagnostic purposes and decision-making, health practitioners dichotomize continuous biomarkers into healthy and diseased patients. The optimal cut-point, the value used to separate these groups, and J occur at an intersection between the probability density functions of cases and controls such that
for some small ε > 0. This is true when the mean of the cases is greater than that of the controls. The second criterion is necessary when multiple intersections exist (for the proof, see Appendix A.1). From now on, c will denote the optimal cut-point resulting in J.
Due to the high costs entailed by some biomarkers, several authors have proposed the use of pooling, and have evaluated ROC curve analysis when dealing with such data.6,15 However, the effect of pooling on J and c has yet to be explored.
In this paper, we extend the work of Faraggi et al6 and Liu and Schisterman15 to evaluation of c and J under various distributional assumptions. We examine the effect of pooling on the efficiency of c and J estimation. In the circumstances when a fixed number of individuals are available for testing and samples are pooled to reduce the number of assays (thus lowering the costs), a loss of information is expected. We measure this loss of information by the change in root mean squared error (RMSE) of the estimate of c and J, and examine the extent of this loss via a simulation study.
We also examine the situation where the number of assays available is fixed and pooling is used to increase the information per assay. This procedure may improve the accuracy of the estimate and is specifically applicable in cases where assaying cost significantly exceeds the cost of obtaining samples.
Inference on Youden Index and Optimal Cut-point Based on Pooled Data: Normal Assumptions
Assume that the responses of a specific biomarker are normally distributed, such that the cases (X), or true positives, have mean μx and variance σx2, and the controls (Y), or true negatives, have mean μy and variance σy2, and μx > μy. For μx < μy, one may simply switch the cases with controls in the following analysis. Under these assumptions, sensitivity (q(c)) and specificity (p(c)) can be written as
for a given cut-point c, where Φ denotes the standard normal distribution function. Accordingly, test measurements falling below c are negative results and those at or above c are positive.
As stated in the previous section, the optimal cut-point (and thus J) occurs at an intersection of the probability density functions of cases and controls. The number of intersections is a function of the variances of the cases and controls. One simple case is that of equal variance in cases and controls, σx2 = σy2, where only one intersection exists and c is simply the midpoint between the means, (μy − μx)/2. In the case of unequal variance, the intersections can be found by the following quadratic equation:
where a = μx − μy and b = σx/σy. To find J, let us first order the intersections, c1 < c2. If b > 1, then J occurs at c2; alternatively, if b < 1, then J occurs at c1. Now, using Eq 3 for sensitivity and Eq 4 for specificity, J can be found from the appropriate cut-point.
When data on both cases and controls are available, appropriate estimates for μx, σx2, μy, σy2 can be calculated. These parameter estimates, when substituted into Eq 3, Eq 4, and Eq 5, yield the estimate ĉ and subsequently Ĵ.
Suppose that the data available are in the form of pooled samples, obtained as follows. First, individuals of similar disease status (ie, cases with cases, controls with controls) are randomly placed into groups of size g. Then, grouped individual specimens are combined as pooled samples and are tested as single observations. Assuming that the specimens are measured per unit of volume, a pool's measurement is then considered as the average of the member's measurements. This has been shown to be a reasonable assumption.4
Consider the instance where there are n and m pooled observations available of cases and controls, respectively, with groups of size g. Let XPi, i = 1,...,n, denote cases, and YPj, j = 1,...,m denote controls such that
Consequently, from the additive property of the normal distribution, we have μxp = μx, σ2xp = σ2x/g, μyp = μy, σ2yp = σ2y/g.
Using the usual notation, let (xp), Sxp, (yp) and Syp denote the standard estimates of μxp, σxp, μyp, and σyp, respectively. The parameters of the unpooled distributions can then be estimated accordingly by μx = (xp), σx = √g Sxp, μy = (yp) and σy = √gSyp. Substituting these estimates for the parameters in the above equations yields the estimates ĉ and Ĵ based on pooled data.
Inference on Youden Index and Optimal Cut-point Based on Pooled Data: Gamma Assumptions
The assumption of normality is often not justifiable in practice. Some biomarkers are skewed right and are best represented by some form of the gamma distribution. Suppose that the responses to a given biomarker follow a gamma distribution such that cases are gamma(αX, βX) and controls are gamma(αY, βY). Based on these distributional assumptions, sensitivity (q(c)) and specificity (p(c)) can be calculated as shown in Appendix A.2 (available with the electronic version of the article).
As with the normal case, J is realized at an intersection of the probability density functions. When case and control responses follow a gamma distribution, a single intersection frequently exists, the location of which defines the optimal cut-point c. Some special cases are
1. αx = αy = α, then
2. βx = βy = β, then
Otherwise, the intersection must be found numerically. When 2 intersections exist, c is located by the previous criteria (Eq 2). Now, J can be calculated at c by substituting Eq 6 and Eq 7 (Appendix A.2) for sensitivity and specificity in Eq 1.
Suppose again that pooled specimens are available with the pooling process being the same as defined earlier. Again, we let XPi, i = 1,...,n and YPj, j = 1,...,m denote the pooled observations of cases and controls, respectively. Using the additive property of the gamma distribution, we have
each having a pooling size g. We will continue to assume that the measure of a pooled observation is the mean of the g unpooled measures. Consequently, αXP = g× αX, αYP = g× αY, βXP = βX/g and βYP = βY/g.
The maximum likelihood estimates ()PX, ()PX, ()PY and ()PY can be obtained numerically from the observations on the pooled specimens. Using these estimators and the association between the distributions of pooled and unpooled observations, estimates of the unpooled distribution parameters can be obtained by ()X = ()PX/g, ()X = g× ()PX, ()Y = ()PY/g, and ()Y = g× ()PY. The estimates ĉ and Ĵ can now be obtained by substituting these estimates for the parameters and following the steps outlined previously.
To fully explore the effects of pooling on the estimates ĉ and Ĵ, we conducted simulation studies by generating data (cases and controls) from either normal or gamma distributions. Since ĉ and Ĵ are a function of the intersection of the probability density functions for cases and controls, the parameters selected represent a wide variety of distributional conditions (normal and gamma) exemplified by different levels of separation (J = 0.2, 0.4, 0.6, 0.8). While simulations at all J levels are presented, analysis will focus primarily on J of 0.6 and 0.8, or the “useful”, better diagnostic biomarker levels. Our simulations were limited to pooled size of 2 or 4 because pooling sizes of 5 and above result in a loss of identifiable skewness, due to the central limit theorem. A summary of our investigation is presented in Tables 1-4 We considered 2 common general conditions regarding availability of samples in an experimental setting.6 The first involves fixing the number of study subjects (N=M= 40,100,200), and the second fixes the number of assays (n = m = 40,100,200). We generated 2000 individual samples from each set of parameters. Percent bias and relative root mean squared error (RMSE) were then determined by comparing estimates to the true c and J (calculated using the true parameter values) as follows:
where ĉ is the estimated optimal cut-point. %Bias(Ĵ) was calculated in the same manner; and
for unpooled data, and
for pooled data of size 2 or 4.
The first condition, when the number of subjects available is fixed, looks at the degradation of the estimate ĉ as pooling size increases (g = 1,2,4) resulting in a decrease in the number of tested samples—n = N/g. For instance, 40 control unpooled specimens are converted to 20-pooled specimens with each specimen consisting of a randomly chosen pair of controls (g = 2) or are converted to 10-pooled specimens with each specimen consisting of randomly chosen tetrad of controls (g = 4). The same procedure is applied to the case population.
Under normality assumptions (Table 1), the percent bias in the estimate of the optimal cut-point was negligible on all levels of discrimination and pooling, even for small sample sizes. As expected, the relative RMSE was inversely associated with the pooled size. No considerable distinction could be made between the RMSE from un-pooled data (g = 1) and pooled data (g = 2), J = 0.6, 0.8. However, for g = 4, the relative loss of efficiency is 3 times that of pairs. This is the effect of central limit theorem and is to be expected when cutting the sample by 75%.
In the gamma case (Table 2), the percent bias and relative RMSE increase in magnitude as g increases and, consequently, n and m decreases. The increase in bias due to pooling is negligible for all J = 0.4, 0.6, and 0.8. Relative RMSE increase for g = 2 are on par with that of the normal case, but g = 4 results are consistently 10% higher than the normal tetrads. The positive bias for both estimates based on the unpooled as well as on the pooled data greatly attenuates as sample size is increased. This is a result of using maximum likelihood estimators to estimate the optimal cut-point under small samples. Moreover, the bias is largely reduced for J = 0.4, 0.6, and 0.8 even for small sample size, which are actually the markers of scientific interest.
Biomarkers with poor distinguishing ability (eg, J = 0.2), also behave poorly under pooling. For example, when 40 unpooled samples are pooled in pairs, the RMSE increases by 37% for the gamma case. More generally, this relationship is true for both normal and gamma cases.
Under the second condition, when the number of assays to be performed is fixed, pooling effectively increases the overall sample size and the amount of information, via an increase in N(N = n · g) (Tables 3 and 4).
Again, bias remains unaffected, less than 1% bias for all levels of pooling for “useful” J. As pooling size increases, there is a consistent reduction in RMSE. For the normal case, as the level of pooling increases (g = 1,2,4), the RMSE for “useful” J substantially decreases (about half for pools of 4). Likewise, under gamma assumptions, as the level of pooling increases, g = 1,2,4, the benefits in RMSE are substantial (40% decrease for pools of 4). Pooling when J = 0.2 and 0.4 reveals a less dramatic benefit in RMSE.
These methods provide a useful tool for making inferences about unpooled samples when assays are based on pooled specimens. This is more clearly seen through use of an example, as illustrated below.
Evidence shows that inflammation may play a contributing role in the development of coronary heart disease (CHD). Interleukin-6 has been linked with the presence of infections in the vessel wall and with atherosclerosis.16,17 Moreover, epidemiologic data show that infection in remote sites in the etiology of CHD.
Individual measurements of interleukin-6 on 80 volunteers were obtained at Cedars-Sinai Medical Center. Forty individuals who recently (within 2 weeks from the event) survived a myocardial infarction (MI) were defined as cases, after being confirmed by rest electrocardiogram (ECG) and laboratory measurements; the remaining 40 subjects served as controls. The controls had a normal rest ECG, were free of symptoms and had no previous cardiovascular procedures or MIs. In addition, the blood specimens were randomly pooled in groups of 2 and 4, for the cases and the controls separately, and remeasured. Faraggi et al6 have shown, using the same data, that for interleukin-6 the assumption that the pooled sample measurements are the equivalent of the average of the individual cases is justified. Due to the costs involved such confirmatory evidence for the averaging assumption will generally not be available.
Distributional assumptions were also tested and found to fit well with gamma assumptions, confirming the findings of Faraggi and coauthors.6 The mean (± SD) in the control and case unpooled samples, respectively, were 1.85 (±1.37) and 4.29 (±2.18). Youden index and cut-point were estimated using the method described previously under gamma assumptions. Table 5 shows that the Youden index was approximately 0.5 for unpooled and pooled data. More importantly, the optimal cut-point was estimated to be 2.41 for unpooled data and was not very much affected by pooling, as shown in Figure 1 and 3. A 95% bootstrapped confidence interval based on unpooled data was estimated to be 1.8 to 3.6, containing both estimates (2.06 [g = 2] and 2.70 [g = 4]) based on pooled data, despite the small number of specimens.
In this paper, we have presented a method to estimate the Youden index and the optimal cut-point and extended its applications to pooled samples. We extend the work of Faraggi et al6 and Liu and Schisterman15 to the cut-point, c, and Youden Index, J, under various distributional assumptions. We have shown that pooling is a statistically viable cost-saving approach, through a reduction in the number of assays required, especially with pool sizes of 2 and 4.
Most other statistical methods currently available for the analysis of biomarkers deal with comparison of proportions between cases and controls and power analysis, eg, for a genotype.4,19 Our methods are specific for continuous data, where finding the optimal cut-point an important issue.
Relation Between Youden Index and The Likelihood Ratio
It is of interest to note that, since the Youden index of a continuous biomarker is a function of sensitivity and specificity, its relation to the likelihood ratio positive and negative may be useful. Graphically, the likelihood ratio positive (LR+) is the slope (q/(1− p)) of the line through the origin and a point on the ROC curve, while the likelihood ratio negative (LR−) is the slope ((1− q)/p) of the line through (1,1) and the same point on the ROC curve. The product of the likelihood ratios [q(1− q)/p(1 − p)] is the slope of the angle bisector. The Youden index, J, is the point at which the product of the two-likelihood ratio is equal to 1 or when the tangent to the ROC curve is parallel to the chance line (Fig. 1). Also, confidence intervals for c and J can be easily obtained using bootstrap methods and statistical software that is currently available.18
Correct implementation of the method developed in this paper requires assumptions, if the researcher sees only the pooled data. The first assumption is that the value obtained from a pooled assay can be considered to be the average of the individual values of the pooled specimens. There is both a biologic and a methodological aspect to this assumption. Biologically, this assumption can be deemed reasonable based on expert knowledge of the biomarker. If, for example, because of the molecular structure of the biomarker, pooling blood samples might yield a statistic other than the average (eg, maximum), then this methodology is inappropriate for the evaluation of the optimal cut-point and the Youden index. On the other hand, when this assumption is reasonable biologically, differences between the pooled sample and the average of individual specimens is due to “random measurement error,” defined as the random variability that led to inaccuracy in the estimation of the true mean value. For instance, if the volume of the individual specimens to be pooled is not equal, the pooled sample will result in a weighted average of the volume per value of the biomarker. Therefore, for normally distributed biomarkers, the addition of mean zero measurement error and variance σε2 will affect the estimates of ĉ one of 3 different ways depending on the ratio between σX2/σY2. If σX2/σY2 = 1, then ĉ will remain unbiased, because the location where the 2 distributions intercept would remain unchanged (Fig. 2). If σX2/σY2 > 1, then ĉ will be positively biased and similarly if σX2/σY2 < 1 then ĉ will be negatively biased. For biomarkers that follow a gamma distribution, measurement error will always cause a positive bias in ĉ. This is due to the dependent relationship between the mean and variance of gamma distributions. Also, measurement error always results in an attenuation of Ĵ. Since J is a measure of differentiation between cases and controls, it is intuitive that when error is introduced the ability to differentiate decreases.
The second assumption is that the unpooled biomarkers follow a known parametric distribution. A more formal evaluation of distributional assumption would be possible using a moment-based estimating-equation approach to deal with situations where likelihood functions based on pooled data are difficult to work with. We outlined the method to obtain estimates and test statistics of the parameters of interest in the general setting. We demonstrated the approach on the family of distributions generated by the Box-Cox transformation model, and, in the process, construct tests for goodness of fit based on the pooled data. Nevertheless, in our experience, the researcher will often develop some sense of both these assumptions during the early stages of the biomarker development by means of a validation study.
Pooling sizes of 5 and above, while fiscally attractive, are prone to 2 difficulties. The first is a consequence of the central limit theorem; averages tend to be more normally distributed as sample size increases. Identifying a biomarker's un-pooled distribution is difficult because the central limit theorem hinders our ability to distinguish between a skewed and a symmetric distribution. The second difficulty arises only when a fixed number of subjects are reduced to an unreasonably small sample size due to pooling and rendering the parameter estimation unreliable. For instance, in the example presented above, we had 40 cases and 40 controls contributing blood samples. If g = 10, then we are left with 8 assays (4 cases and 4 controls) on which to estimate the means and standard deviations necessary for ĉ and Ĵ.
This method is relevant to studies of markers for early detection and prevention of disease and for studies of markers of exposure and disease in molecular epidemiology when, for example, deciding whether a biomarker is worth pursuing further or is ready for a study. Furthermore, once this method is applied and a biomarker demonstrates discriminatory ability, the optimal cut-point can be used in clinical practice to classify patients as healthy or diseased, after proper validation.
In summary, we showed that estimating c and J under pooling is a cost-effective, statistically sound approach for evaluating biomarkers. Such estimation has potential applications for research and clinical practice and for hypothesis development.
1.Farrington C. Estimating prevalence by group testing using generalized linear models. Stat Med
2.Tu X, Litvak E, Pagano M. On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening. Biometrika
3.Barcellos L, Klitz W, Field L, et al. Association mapping of disease loci, by use of a pooled DNA genomic screen. Am J Hum Genet
4.Weinberg CR, Umbach DM. Using pooled exposure assessment to improve efficiency in case-control studies. Biometrics
5.Kemdziorski CM, Zhang Y, Lan H, Attie AD. The efficiency of pooling mRNA in micro array experiments. Biostatistics
6.Faraggi D, Reiser B, Schisterman EF. ROC curve analysis for biomarkers based on pooled assessments. Stat Med
7.Zou KH, Hall WJ, Shapiro DE. Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Stat Med
8.Zweig MH, Campbell G. Receiver operator characteristic (ROC) plots; a fundamental evaluation tool in clinical medicine. Clin Chem
9.Goddard MJ, Hinbery I. Receiver operator characteristic (ROC) curves and non-normal data: an empirical study. Stat Med
10.Wieand S, Gail MH, James BR, James KL. A family of non-parametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika
11.Youden WJ. An index for rating diagnostic tests. Cancer
12.Barkan N. Statistical inference on r*specificity + sensitivity. Doctoral Dissertation 2001. Haifa University.
13.Bamber DC. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol
14.Hilden J, Glasziou P. Regret graphs, diagnostic uncertainty and Youden's Index. Stat Med
15.Liu A, Schisterman EF. Comparison of diagnostic accuracy of biomarkers with pooled assessments. Biom J
16.Chilton RJ. Recent discoveries in assessment of coronary heart disease: impact of vascular mechanisms on development of atherosclerosis. J Am Osteopath Assoc
17.Yudkin JS, Kumari M, Humphries SE, Mohamed-Ali V. Inflammation, obesity, stress and coronary heart disease: is interleukin-6 the link? Atherosclerosis
18.Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med
19.Peng X, Wood CL, Blalock EM, et al. Statistical implications of pooling RNA samples for microarray experiments. BMC Bioinformatics
Assume that cases, X, and controls, Y, are represented by continuous unimodal distributions, and μy < μx. Let c0 be some cut-point and ci(I = 1,2) be the ith intersection of the probability density functions denoted by f. Youden index (J) is found by
The intervals for which fy > fx and fy < fx are determined by the variances of the distributions. Assuming σx2 > σy2 could result in 1 or 2 intersections. The 2-intersection case follows
For c0 in (−∞, c1)
Similarly, for c0 in (c1, c2)
And, for c0 in (c2, ∞)
A similar argument proves that when a single intersection exists, the intersection is the cut-point for J. For the case where σx2 < σy2, this approach yields c1 as the optimal cut-point used for J.
Note: Using Figure 1 as a reference, it can be seen that moving the cut-point to the right would result in a loss in shaded area (Youden index). Since Youden index can be represented by the area between the 2 curves to either the right or left of the cut-point, moving the cut-point to the left also result in a decrease.
Supplemental Digital Content
© 2005 Lippincott Williams & Wilkins, Inc.