Testing HIV patients' plasma HIV RNA level every 3 months is the standard of care in resource-rich settings and is used to alert providers to the potential need for enhanced adherence interventions and/or the need to change a failing antiretroviral therapy (ART) regimen before accumulation of resistance mutations, disease progression, and death.1–3 Routine serial HIV RNA testing for all patients on ART, however, is expensive and the majority of tests in patients on stable regimens yield undetectable HIV RNA.4–6
If sufficiently sensitive, a selective testing approach in which plasma HIV RNA is measured only when screening criteria suggest increased risk of virological failure might be used to reduce expense related to HIV RNA testing. However, several screening rules proposed to date, based on either clinical data alone or some combination of clinical data and self-reported adherence data, had low sensitivity (20%–67%), particularly in validation populations.4–6 Pharmacy refill data were reported to classify failure with greater accuracy than self-reported adherence data,7 suggesting that alternate approaches to measuring adherence may improve a selective HIV RNA testing strategy.
Previous studies examining associations between adherence and virological failure have generally focused on a single summary of adherence data, such as average adherence over some interval preceding HIV RNA assessment.7–12 However, adherence patterns are myriad and the association of any 1 pattern with virological failure is unclear.8 Furthermore, measurement of adherence, whether based on self-report, pill count, or electronic medication container opening, is inherently imperfect. Finally, electronic adherence monitoring methods, some of which transmit data in real time, are becoming more widely available.13 In summary, it is unknown how best to combine adherence and clinical data to predict virological failure.12,14–17 To address these challenges, we built prediction models for virological failure using Medication Event Monitoring System (MEMS) and clinical data analyzed with Super Learner, a data-adaptive algorithm based on cross-validation (ie, multiple internal data splits).18,19
We investigated the potential for pill container openings recorded using MEMS to correctly classify virological failure in a large clinically and geographically heterogeneous population of patients with HIV treated with ART in the United States. We investigated (1) the extent to which a machine-learning method (Super Learner) applied to MEMS adherence and clinical data improved classification of failure beyond a single time-updated MEMS adherence summary; (2) the extent to which addition of MEMS to basic clinical data improved classification of failure; and (3) the potential for the resulting risk score to reduce frequency of HIV RNA measurements while detecting at least 95% virological failures.
Patient Population and Outcome
We studied HIV-positive patients in the Multisite Adherence Collaboration on HIV-14 (MACH14), who underwent ART adherence monitoring with MEMS between 1997 and 2009. MEMS monitoring consists of a date and time stamp recorded electronically with each pill container opening, subsequently downloaded to a database through USB. Details on the MACH14 population have been reported elsewhere.20 In brief, 16 studies based in 12 US states contributed longitudinal clinical, MEMS, and HIV RNA data to the consortium. Subjects were eligible for inclusion in our analyses if they had at least 2 plasma HIV RNA measurements during follow-up, with at least one of the postbaseline measurements preceded by MEMS monitoring in the previous month.
We considered 2 virological failure outcomes, defined as a single HIV RNA level >400 copies per milliliter and >1000 copies per milliliter, respectively, the latter to improve specificity for true failure versus transient elevations (ie, “blips”21) and facilitate comparison with previous studies.5,7,22 Baseline (first available) HIV RNA level was used as a predictor variable in some models, as detailed below. Postbaseline HIV RNA levels were included as outcome variables if they were preceded by MEMS monitoring in the previous month. Follow-up was censored at first-detected virological failure after baseline. Subjects could contribute multiple postbaseline HIV RNA tests as outcomes.
Candidate Predictor Variables
The first prediction model (“Clinical”) was built using the following nonvirological candidate variables: time since study enrollment, baseline, nadir, and most recent CD4 count, time since most recent CD4 count, ART regimen class (nonnucleoside reverse transcriptase inhibitor, boosted protease inhibitor, nonboosted protease inhibitor, and “other” regimens; Table 1), an indicator of change in regimen class in preceding 2 months, and time since any regimen change. Study site was excluded because, were the algorithm used in practice, site-specific fits would not generally be available. Additional potential predictor variables, including time since diagnosis and full treatment history, were missing on a large proportion of subjects (Table 1) and were not included as candidate predictors in this analysis.
The second prediction model (“Electronic Adherence/Clinical”) augmented the set of candidate predictors used in the nonvirological clinical model with summaries of past MEMS data. MEMS data were used to calculate adherence summaries accounting for daily prescribed dosing over a set of intervals ranging from 1 week to 6 months preceding each outcome HIV RNA test. Dates of reported device nonuse (due, for example, to hospitalization or imprisonment) were excluded from adherence calculations.
Adherence summaries included percent doses taken (number of recorded doses/number of prescribed doses), number of interruptions (consecutive days with no recorded doses) of 1,2, 3, and 4 days duration, variance across days of percent doses taken per day, and versions of these summaries using only weekday and (for all but interruptions of >2 days) only weekend measurements. MEMS summaries for intervals over which no MEMS measurements were made were carried forward as their last observed value, and indicators of carry forward were included in the predictor set. This approach, unlike multiple imputation, reflects the data that would be available to inform a testing decision for a given patient. Number of drugs monitored, drug and drug class monitored, and number of days monitored with MEMS during each interval were also included.
Comparison of “Clinical” and “Electronic Adherence/Clinical” model performance evaluated the added value of MEMS data. We also investigated further gains from adding a single baseline HIV RNA measurement (“Baseline HIV RNA/Electronic Adherence/Clinical”) and, because extent of past virological suppression can reflect adherence behavior and impact likelihood of resistance,23 full HIV RNA history (“HIV RNA/Electronic Adherence/Clinical”) to the set of candidate predictor variables. The latter included most recent HIV RNA value, time since most recent HIV RNA test, number of previous tests, and an indicator of regimen switch since last test.
Construction of Prediction Models
The Super Learner algorithm was applied to each set of candidate variables in turn to build 4 prediction models for each failure definition (HIV RNA level >400 copies per milliliter and >1000 copies per milliliter). Super Learner is described in detail elsewhere.18,19 In brief, we specified a library of candidate prediction algorithms that included Lasso regression,24 preliminary Lasso screening combined with a generalized additive model,25,26 a generalized boosted model,27 multivariate adaptive polynomial spline regression,28 and main term logistic regression. For comparison, percent of prescribed doses taken over the past 3 months was evaluated as a single time-updated predictor. To achieve a more balanced data set and reduce computation time, all HIV RNA levels above failure threshold and an equal number of randomly sampled HIV RNA levels below threshold were used to build the models, with weights used to correct for sampling. The ability of this approach to build predictors with comparable or improved performance on the full sample is supported by theory and prior analyses,29 and was verified for this data set using selected prediction models. Ten-fold cross-validation was used to choose the convex combination of algorithms in the library that achieved the best performance on data not used in model fitting, with the negative log likelihood used as loss function (ie, performance metric).30
Evaluation of Prediction Models
The performance of the Super Learner prediction model was evaluated using a second level of cross-validation, ensuring that performance was evaluated with data not used to build the model. The data were split into 10 folds (partitions) of roughly equal size, with all repeated measures from a given individual contained in a single fold. The Super Learner algorithm was run on each training set (containing 9 of the 10 folds) and the performance of the selected model was evaluated with the corresponding validation set, which consisted exclusively of subjects not used to build the prediction model. Cross-validated performance measures averaged performance across validation sets.
Area under the receiver operating characteristic curve (AUROC) was used to summarize classification performance across the range of possible cutoffs for classifying a given test based on predicted failure probability. Standard error estimates, accounting for repeated measures on an individual, were calculated based on the influence curve.31 For each of a range of cutoffs, we also calculated sensitivity, specificity, negative and positive predictive value, and the proportion of HIV RNA tests for which the predicted probability of failure was below the cutoff. We refer to the latter metric as “capacity savings” because it reflects, for a given cutoff, the proportion of HIV RNA tests that would have been avoided under a hypothetical selective testing strategy in which HIV RNA tests were ordered only when a subject's predicted probability of failure on a given day on which an HIV RNA test was scheduled was greater than the cutoff.
We estimated capacity savings corresponding to a range of sensitivities. For example, for each validation set, we selected the largest cutoff for which sensitivity was ≥95%, calculated the proportion of HIV RNA tests that had a predicted probability less than this cutoff, and averaged these proportions across validation sets. This provided an estimate of the proportion of HIV RNA tests that could have been avoided under a hypothetical selective testing rule chosen to detect at least 95% of failures without additional delay.
We conducted a basic analysis to provide a rough estimate of the cost per month at which an electronic adherence monitoring system would remain cost neutral. For each candidate prediction model, we first estimated gross cost savings that would be achieved with cutoff chosen to maintain sensitivity ≥95% as the number of postbaseline HIV RNA tests below cutoff times the estimated combined cost of an outpatient visit and HIV RNA test. We used an estimated unit cost for an outpatient primary care visit based on the 2014 Medicare National Physician Fee Schedule ($90–$140 for CPT-4 code 99214) and unit cost for an HIV RNA test of $50–$90. We divided gross cost savings by total person time under MEMS follow-up. Person time was calculated from beginning of MEMS monitoring to the minimum of last postbaseline HIV RNA test date and end of MEMS monitoring.
Analyses were performed using MatLab and R,32,33 including the ROCR, SuperLearner, and cvAUC R packages.34–36
Of the 2835 patients in the MACH14 cohort, 1478 patients met the inclusion criteria and contributed a total of 3096 postbaseline HIV RNA tests to prediction of failure defined using a threshold of 1000 copies per milliliter. Consecutive postbaseline HIV RNA tests were a median of 45 days apart [interquartile range (IQR), 28–99; Table 1]. Baseline and final HIV RNA tests were a median of 163 days apart (IQR, 82–282). When failure was defined as >400 copies per milliliter, 2751 postbaseline HIV RNA tests met the inclusion criteria; baseline and final HIV RNA tests were a median of 143 days apart (IQR, 66–273).
At baseline, subjects had known of their HIV status for a median of 9.2 years (IQR, 4.0–16.1), and the majority (83%) had past exposure to ART (Table 1). A median of 78 days of MEMS data (IQR, 28–91) were available in the 3 months preceding HIV RNA tests conducted after the baseline assessment. Over the 3 months preceding each postbaseline HIV RNA test, median percent doses taken, as measured by MEMS, was 84% (IQR, 50–98) and subjects had experienced a median of 4 interruptions ≥24 hours in MEMS events (IQR, 1–11). Using a threshold of 1000 and 400 copies per milliliter, respectively, 20% and 27% of postbaseline HIV RNA tests were failures and 41% and 51% of patients failed.
Classification of Failure
The Super Learner Algorithm applied to each of the 3 predictor sets that included MEMS data resulted in a higher cross-validated AUROC than MEMS-based measurement of 3-month percent doses taken (Fig. 1 and Table 2; P < 0.001 for each pairwise comparison). The AUROC for classification of failure by 3-month percent doses taken was 0.64 [95% confidence interval (CI): 0.61 to 0.67], and 0.60 (95% CI: 0.57 to 0.63) for HIV RNA failure thresholds of 1000 and 400 copies per milliliter, respectively.
With failure defined as >1000 copies per milliliter, the Super Learner model based on nonvirological clinical predictors achieved moderate classification performance (“Clinical” AUROC: 0.71, 95% CI: 0.69 to 0.74). Addition of MEMS data significantly improved performance (“Electronic Adherence/Clinical” AUROC: 0.78, 95% CI: 0.75 to 0.80, P < 0.001). Addition of a single baseline HIV RNA level (“Baseline HIV RNA/Electronic Adherence/Clinical” AUROC: 0.82, 95% CI: 0.79 to 0.84) or full HIV RNA history (“HIV RNA/Electronic Adherence/Clinical” AUROC: 0.86, 95% CI: 0.84 to 0.88) improved performance further (P < 0.01 for each pairwise comparison). With failure defined as >400 copies per milliliter, corresponding AUROCs were slightly higher (Table 2). Figure 2 shows cross-validated sensitivity, specificity, and positive and negative predicted value for each of the Super Learner models across a range of cutoffs.
Selective HIV RNA Testing
Figure 3 shows cross-validated estimated capacity savings for a range of sensitivities. With failure defined as >1000 copies per milliliter and cutoff chosen to maintain sensitivity ≥95%, a testing rule based on nonvirological clinical predictors alone achieved a capacity savings of 15%. Capacity savings increased with addition of MEMS data to 22%, with further addition of a single baseline HIV RNA level to 31%, and with addition of full HIV RNA history to 41%. Estimated cost savings per person-month MEMS follow-up increased from $9–$14 with nonvirological clinical data alone, to $18–$39 dollars after incorporating MEMS and HIV RNA data (Table 2). Capacity savings at 95% sensitivity were slightly lower with a failure threshold of 400 copies per milliliter (Table 2).
We examined how well electronic adherence (MEMS) and clinical data, analyzed using Super Learner, could classify virological failure (HIV RNA >400 or >1000 copies per milliliter) in a heterogeneous population of HIV-positive patients in the United States. MEMS-based measurement of percent doses taken resulted in poor classification of failure (AUROC, 0.60–0.64), as reported by others (AUROC, <0.67).8 Super Learner applied to MEMS and nonvirological clinical data significantly improved AUROC (0.78–0.79). MEMS data also significantly improved classification of failure over nonvirological clinical data alone, and addition of past HIV RNA data improved classification further.
Although MEMS does not transmit data in real time, it provides proof of concept that real-time electronic adherence monitoring, which is now available through other devices, could inform frequency and timing of HIV RNA testing.13,37,38 To investigate the potential for an electronic adherence-based algorithm to reserve HIV RNA testing for patients with non-negligible risk of failure, we evaluated the extent to which testing could have been reduced while detecting almost all failures without additional delay. Using a selective testing approach based on MEMS and clinical data, we estimated that 18%–31% of HIV RNA tests could have been avoided (depending on failure definition and inclusion of a single baseline HIV RNA level) while missing no more than 5% of subjects failing on a given date. We were unable to evaluate the extent to which any missed failures would have been detected at a later date under a selective testing strategy because in our data failures that would have been missed were instead detected and responded to. Our analyses therefore censored data at date of first-detected failure.
Addition of past HIV RNA data improved capacity savings further to 34%–41%. If a selective testing strategy were applied in practice, a reduced set of HIV RNA data would be available to guide real-time decision making (some HIV RNA tests would not have been ordered). Therefore, capacity savings using an algorithm that incorporates full HIV RNA history should be seen as an upper bound.
Our results suggest an immediate gross cost savings of roughly $16–$29 per person-month of MEMS use, assuming access to a single baseline HIV RNA measurement. This estimate represents an upper limit for what the health care system might reasonably spend on all components of a real-time adherence monitoring system, which would need to include devices, monitoring staff, and data transmission, while remaining cost neutral. Our cost savings estimate was conservative however, in that only HIV RNA tests with MEMS monitoring in the prior month could be deferred, and reported interruptions in monitoring were included in total MEMS time. We did not account for costs incurred by changes in second line regimen use, resistance testing, or disease progression, and we assumed that a reduction in HIV RNA testing would save both laboratory and associated clinic visit costs; excluding visit costs would reduce our estimated savings. Our estimates might also differ substantially under different HIV RNA monitoring schedules, in populations with different virological failure rates, and in settings where MEMS was used with fewer interruptions and over longer durations.
Furthermore, we conservatively assumed that real-time monitoring conferred no benefit and evaluated a hypothetical system under which MEMS could be used to defer regularly scheduled tests, but not to trigger extra or early tests. In practice, real-time electronic adherence monitoring could be used to trigger tests between scheduled testing dates, which while increasing cost could introduce additional potentially significant benefits including opportunities to both detect failure earlier and to prevent it from occurring.39 Although frequency of HIV RNA monitoring in our cohort was similar to that seen in standard clinical care, our results should thus be viewed as initial proof of concept for future investigation including full cost-effectiveness analysis.
More efficient strategies for HIV RNA testing are especially important in resource-limited settings where standard HIV RNA testing is cost prohibitive. A number of possible strategies for selective HIV RNA testing in resource-limited settings have been proposed. WHO-recommended CD4-based criteria have been repeatedly shown to have poor sensitivity for detecting failure,40–43 whereas a number of alternative clinical and CD4-based rules proposed for selective HIV RNA testing also exhibited moderate to poor sensitivity in validation data sets.4–6 However, a recent clinical prediction score that incorporated self-report adherence, regular hemoglobin and CD4 monitoring, and new onset papular pruritic rash achieved more promising performance in Cambodia,22,44 as did pharmacy refill data, alone or in combination with CD4 counts, in South Africa.7 Robbins et al45 also achieved good classification of failure among US patients using adherence data abstracted from clinical notes, drug and alcohol use, and past appointment history in addition to HIV RNA, CD4 and ART data. Although these studies were conducted using predictor variables not available in this study, they provide additional proof of concept for a strategy of selective HIV RNA testing that incorporates adherence measures. The utility and cost-effectiveness of incorporating electronic adherence data in such a strategy remains to be determined. We found that a selective testing approach based on CD4 count and ART measures alone could achieve a 14%–15% potential capacity savings while maintaining sensitivity at 95%. The performance of our classification rule might have been improved by including more extensive clinical history and self-reporting of nonadherence. Thus, although our study provides insight into classification performance achievable in settings where rich and carefully documented longitudinal clinical data are not available but electronic adherence data are, further investigation of the value added by electronic adherence data in settings with more extensive patient data remains of interest.
Our analyses focused on a machine-learning method (Super Learner) to build optimal predictors of virological failure and evaluated the extent to which the resulting risk scores could be used to reduce HIV RNA testing while maintaining sensitivity at a fixed level if all patients with risk scores above a cutoff were tested. We focused on capacity savings rather than specificity as any initial false positives would be correctly classified with subsequent HIV RNA testing using this approach. Recently developed methods instead take the risk score and a constraint on testing as given and develop optimal tripartite rules, in which only patients with an intermediate risk score are tested.46 Combined application of these approaches is an exciting area of future research.
The patient population in this study included patients with varied clinical histories receiving care in a range of locations throughout the United States; value added by electronic adherence data may vary in different populations. Furthermore, subjects were followed as part of different studies, with distinct protocols for data collection and quality control. We excluded any reported periods of MEMS nonuse from our adherence calculations; however, gaps in MEMS events might still reflect device malfunction or nonuse rather than missed doses. Real-time reporting of electronic adherence data may improve ability to distinguish between these possibilities through real-time patient queries and thus may improve classification performance achievable with MEMS data still further.13,38
In summary, our results support Super Learner as a promising approach to developing algorithms for selective HIV RNA testing based on the complex data generated by electronic adherence monitoring in combination with readily available clinical variables. A patient's risk of current virological failure, based on time-updated clinical and MEMS data, could be made available to clinicians in real time (eg, as an automated calculation in an electronic medical record or smart phone application) to help determine whether a clinic visit and/or HIV RNA test is indicated, allowing for personalized testing and visit schedules. Our results provide initial proof of concept for the potential of such an approach to reduce costs while maintaining outcomes.
1. Hatano H, Hunt P, Weidler J, et al.. Rate of viral evolution and risk of losing future drug options in heavily pretreated, HIV-infected patients who continue to receive a stable, partially suppressive treatment regimen. Clin Infect Dis. 2006;43:1329–1336.
2. Petersen ML, van der Laan MJ, Napravnik S, et al.. Long-term consequences of the delay between virologic failure of highly active antiretroviral therapy and regimen modification. AIDS. 2008;22:2097–2106.
3. Barth RE, Wensing AM, Tempelman HA, et al.. Rapid accumulation of nonnucleoside reverse transcriptase inhibitor-associated resistance: evidence of transmitted resistance in rural South Africa. AIDS. 2008;22:2210–2212.
4. Abouyannis M, Menten J, Kiragga A, et al.. Development and validation of systems for rational use of viral load testing in adults receiving first-line ART in sub-Saharan Africa. AIDS. 2011;25:1627–1635.
5. Meya D, Spacek LA, Tibenderana H, et al.. Development and evaluation of a clinical algorithm to monitor patients on antiretrovirals in resource-limited settings using adherence, clinical and CD4 cell count criteria. J Int AIDS Soc. 2009;12:3.
6. Colebunders R, Moses KR, Laurence J, et al.. A new model to monitor the virological efficacy of antiretroviral treatment in resource-poor countries. Lancet Infect Dis. 2006;6:53–59.
7. Bisson GP, Gross R, Bellamy S, et al.. Pharmacy refill adherence compared with CD4 count changes for monitoring HIV-infected adults on antiretroviral therapy. PLoS Med. 2008;5:e109.
8. Genberg BL, Wilson IB, Bangsberg DR, et al.. Patterns of antiretroviral therapy adherence and impact on HIV RNA among patients in North America. AIDS. 2012;26:1415–1423.
9. de Boer IM, Prins JM, Sprangers MA, et al.. Using different calculations of pharmacy refill adherence to predict virological failure among HIV-infected patients. J Acquir Immune Defic Syndr. 2010;55:635–640.
10. McMahon JH, Manoharan A, Wanke CA, et al.. Pharmacy and self-report adherence measures to predict virological outcomes for patients on free antiretroviral therapy in Tamil Nadu, India. AIDS Behav. 2013;17:2253–2259.
11. Goldman JD, Cantrell RA, Mulenga LB, et al.. Simple adherence assessments to predict virologic failure among HIV-infected adults with discordant immunologic and clinical responses to antiretroviral therapy. AIDS Res Hum Retroviruses. 2008;24:1031–1035.
12. Liu H, Golin CE, Miller LG, et al.. A comparison study of multiple measures of adherence to HIV protease inhibitors. Ann Intern Med. 2001;134:968–977.
13. Haberer JE, Robbins GK, Ybarra M, et al.. Real-time electronic adherence monitoring is feasible, comparable to unannounced pill counts, and acceptable. AIDS Behav. 2012;16:375–382.
14. Oyugi JH, Byakika-Tusiime J, Ragland K, et al.. Treatment interruptions predict resistance in HIV-positive individuals purchasing fixed-dose combination antiretroviral therapy in Kampala, Uganda. AIDS. 2007;21:965–971.
15. Parienti JJ, Ragland K, Lucht F, et al.. Average adherence to boosted protease inhibitor therapy, rather than the pattern of missed doses, as a predictor of HIV RNA replication. Clin Infect Dis. 2010;50:1192–1197.
16. Parienti JJ, Das-Douglas M, Massari V, et al.. Not all missed doses are the same: sustained NNRTI treatment interruptions predict HIV rebound at low-to-moderate adherence levels. PLoS One. 2008;3:e2783.
17. Parienti JJ, Massari V, Descamps D, et al.. Predictors of virologic failure and resistance in HIV-infected patients treated with nevirapine- or efavirenz-based antiretroviral therapy. Clin Infect Dis. 2004;38:1311–1316.
18. Polley EC, van der Laan MJ. Super Learner
in Prediction. U.C. Berkeley Division of Biostatistics Working Paper Series. Available at: http://biostats.bepress.com/ucbbiostat/paper266/2010
. Accessed February 16, 2015.
19. van der Laan MJ, Polley EC, Hubbard AE. Super learner
. Stat Appl Genet Mol Biol. 2007;6:Article25.
20. Liu H, Wilson IB, Goggin K, et al.. MACH14: a multi-site collaboration on ART adherence among 14 institutions. AIDS Behav. 2013;17:127–141.
21. Havlir DV, Koelsch KK, Strain MC, et al.. Predictors of residual viremia in HIV-infected patients successfully treated with efavirenz and lamivudine plus either tenofovir or stavudine. J Infect Dis. 2005;191:1164–1168.
22. Phan V, Thai S, Koole O, et al.. Validation of a clinical prediction score to target viral load testing in adults with suspected first-line treatment failure in resource-constrained settings. J Acquir Immune Defic Syndr. 2013;62:509–516.
23. Rosenblum M, Deeks SG, van der Laan M, et al.. The risk of virologic failure decreases with duration of HIV suppression, at greater than 50% adherence to antiretroviral therapy. PLoS One. 2009;4:e7196.
24. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996;58:267–288.
25. Hastie T, Tibshirani R. Generalized additive models. Stat Sci. 1986;1:297–318.
26. Hastie T. Generalized additive models. In: Chambers JM, Hastie TJ, eds. Statistical Models in S. Boca Raton, FL: Chapman & Hall/CRC Press. 1992:249–304.
27. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29:1189–1232.
28. Friedman JH. Multivariate adaptive regression splines (with discussion). Ann Stat. 1991;19:1–141.
29. Rose S, Fireman B, van der Laan M. Nested Case-Control risk score prediction. In: van der Laan M, Rose S, eds. Targeted Learning: Causal Inference for Observational and Experimental Data. New York, NY: Springer; 2011:239–245.
30. van der Laan M, Dudoit S, Keles S. Asymptotic Optimality of likelihood-based cross-validation. Stat Appl Genet Mol Biol. 2006;3:Article4.
31. LeDell E, Petersen ML, van der Laan MJ. Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 312. 2012.
32. MATLAB 8.0 and Statistics Toolbox 8.1, The MathWorks, Inc., Natick, Massachusetts, United States.
33. R Core Team (2013). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL. Available at: http://www.R-project.org/
. Accessed February 16, 2015.
34. Sing T, Sander O, Beerenwinkel N, et al.. ROCR: Visualizing the performance of scoring classifiers. R package version 1.0-4. Available at: http://CRAN.R-project.org/package=ROCR
. 2009. Accessed February 16, 2015.
35. Polley E, van der Laan MJ. SuperLearner: Super Learner
Prediction. R package version 2.0-9. Available at: http://CRAN.R-project.org/package=SuperLearner
. 2012. Accessed February 16, 2015.
36. LeDell E, Petersen ML, van der Laan MJ. cvAUC: Cross-Validated Area Under the ROC Curve Confidence Intervals. R package version 1.0-0. Available at: http://CRAN.R-project.org/package=cvAUC
. 2013. Accessed February 16, 2015.
37. Haberer JE, Kahane J, Kigozi I, et al.. Real-time adherence monitoring for HIV antiretroviral therapy. AIDS Behav. 2010;14:1340–1346.
38. Haberer JE, Kiwanuka J, Nansera D, et al.. Real-time adherence monitoring of antiretroviral therapy among HIV-infected adults and children in rural uganda. AIDS. 2013;27(13):2166–2168.
39. Freedberg KA, Hirschhorn LR, Schackman BR, et al.. Cost-effectiveness of an intervention to improve adherence to antiretroviral therapy in HIV-infected patients. J Acquir Immune Defic Syndr. 2006;43:S113–S118.
40. Westley BP, DeLong AK, Tray CS, et al.. Prediction of treatment failure using 2010 World Health Organization Guidelines is associated with high misclassification rates and drug resistance among HIV-infected Cambodian children. Clin Infect Dis. 2012;55:432–440.
41. Ferreyra C, Yun O, Eisenberg N, et al.. Evaluation of clinical and immunological markers for predicting virological failure in a HIV/AIDS treatment cohort in Busia, Kenya. PLoS One. 2012;7:e49834.
42. Ingole N, Mehta P, Pazare A, et al.. Performance of immunological response in predicting virological failure. AIDS Res Hum Retroviruses. 2013;29:541–546.
43. Keiser O, MacPhail P, Boulle A, et al.. Accuracy of WHO CD4 cell count criteria for virological failure of antiretroviral therapy. Trop Med Int Health. 2009;14:1220–1225.
44. Lynen L, An S, Koole O, et al.. An algorithm to optimize viral load testing in HIV-positive patients with suspected first-line antiretroviral therapy failure in Cambodia. J Acquir Immune Defic Syndr. 2009;52:40–48.
45. Robbins GK, Johnson KL, Chang Y, et al.. Predicting virologic failure in an HIV clinic.Clin Infect Dis. 2010;50:779–786.
46. Liu T, Hogan JW, Wang L, et al.. Optimal Allocation of Gold standard testing under constrained Availability: application to assessment of HIV treatment failure. J Am Stat Assoc. 2013;108:1173–1188.