In response to the HIV/AIDS pandemic, the WHO launched the ‘3 × 5’ initiative in 2003 . Within this, treatment guidelines were issued for healthcare professionals scaling up antiretroviral treatment (ART) in resource-limited settings  in which clinicians often do not have access to the laboratory tests used in industrialized countries for monitoring ART, particularly HIV viral loads. These guidelines were developed by consensus and updated in 2006 . Defining treatment failure in the absence of viral load measurements poses a particular challenge in the public health approach to ART delivery . The guidelines propose criteria for defining treatment failure based on either CD4 cell counts, or clinical manifestations of disease progression, or both, but these criteria have not been validated.
The aim of this study was to evaluate the performance of CD4 and clinical criteria, as defined by WHO guidelines, as indicators of virological treatment failure. We used data from a workplace ART programme in South Africa  to assess these criteria among individuals attending a clinic visit 12 months after starting ART, by evaluating whether each individual fulfilled the criteria for CD4 or clinically defined treatment failure, as described in the WHO guidelines [2,3], and comparing these results with the ‘gold standard’ of treatment failure as defined by viral load criteria.
The study population consisted of employees attending workplace-based ART clinics at two gold mining sites in South Africa. The programme has been described elsewhere . Criteria for starting ART in this programme are a WHO stage 4 condition; a WHO stage 3 condition and a CD4 cell count of less than 350 cells/μl; or a CD4 cell count of less than 250 cells/μl irrespective of clinical stage. The standard first-line ART regimen is a combination of zidovudine, lamivudine and efavirenz. After starting ART, individuals have routine clinical assessments at 2, 6, 12, 24, 36 and 48 weeks. CD4 cell count and viral load are monitored prior to ART initiation (baseline) and at 6, 24 and 48 weeks.
The study cohort consisted of individuals commencing ART for the first time between 13 November 2002 and 7 August 2004, who attended a clinical review visit at 12 months after ART start in whom an HIV viral load result was available. The upper and lower limits allowable for the 12-month visit were 300 and 450 days, respectively. If there was more than one CD4 cell or viral load result within this time window, we used the latest available, to maximize the probability of accurately capturing treatment failure.
Individuals who had either stopped treatment completely or switched treatment to a second-line therapy prior to the 12-month visit were excluded. A complete stop of treatment was defined by a complete cessation of all ART drugs, with no restart of therapy by 450 days. Those with temporary discontinuations in treatment were included provided they restarted first-line ART therapy by the 12-month visit. Individuals were not excluded if alternative first-line drugs were substituted for those in the initial regimen due to intolerance.
Data on clinical events and changes to ART were collected on standardized forms at clinic visits and from inpatient episodes. Data were entered into a relational database by trained data entry staff; laboratory data were downloaded electronically into the same database. Anonymized extracts from this database provided the main source data for this study. Data from tuberculosis (TB) clinic records were also used to ensure no TB episodes were missed.
Definitions of treatment failure
The guidelines for this ART programme recommend that an HIV viral load greater than 400 copies/ml at 12 months should be repeated, but repeat results were not always available, and our definitions were designed to take account of this.
Treatment failure defined by HIV viral load criteria was classified as definite if there was a single viral load at 12 months of greater than 10 000 copies/ml, as this is in keeping with the most recent WHO recommendations  and associated with risk of clinical disease progression ; or probable if there was either a single viral load at 12 months greater than 1000 copies/ml, or a viral load at 12 months of at least 400 copies/ml followed by a second reading of at least 400 copies/ml more than 30 days later and within the window defined for the 12-month visit (300–450 days). We defined an adequate virological response, which may be alternatively referred to as a successful response to treatment, as a viral load measurement at 12 months less than 400 copies/ml.
Treatment failure, based on CD4 criteria for the CD4 cell count measured at the 12-month visit (300–450 days), was defined according to WHO guidelines  as either a return to below the pretherapy baseline value or fall in CD4 cell count to less than 50% of the maximum CD4 cell count while on therapy. In accordance with these guidelines, a CD4 cell count below 100 cells/μl was always considered to define failure; and a CD4 cell count above 200 cells/μl was never defined as failure. In addition, a CD4 cell count that fulfilled any of the above criteria for treatment failure but was within 30 days of a concurrent episode of infection was not considered to define treatment failure . The pretherapy baseline value was defined as the closest CD4 cell count to the date of ART initiation, within 90 days prior to the start of ART. The maximum value was determined from analysis of all values after the start of ART, prior to the record of CD4 cell count that indicated failure.
In a secondary analysis, we also categorized an individual as having treatment failure on CD4 cell count criteria if the CD4 cell count at the 6-month (180–240 days) visit fulfilled the above criteria.
Treatment failure based on clinical events was defined as the occurrence of either a new or recurrent WHO-defined stage 3 or stage 4 event at least 6 months after the start of ART . ‘Severe weight loss’ (WHO stage 3) was defined as a measured weight loss of at least 10% compared with the maximum value recorded on ART. We considered any bacterial infection that resulted in hospitalization as severe and, thus, fulfilling the criteria for categorization as a WHO stage 3 condition. The records of individuals with clinical failure were reviewed by an experienced HIV clinician (A.D.G.), to identify cases in which there was an explanation other than treatment failure accounting for conditions that would otherwise be categorized as fulfilling the criteria for WHO stages 3 or 4, for example, if there was strong evidence that the condition predated the start of ART, or, in the case of severe weight loss, was explicable by a concurrent clinical condition.
Clinical events were not considered to define clinical treatment failure if they occurred in the first 6 months (taken as 180 days) after ART start, in line with WHO definitions, or if they occurred after the date of the 12 month viral load measurement. Further, clinical events occurring more than 6 months after starting ART that were noted by the study clinician to represent possible immune reconstitution syndrome (IRS) were not considered to define treatment failure.
Statistical analyses were carried out using Intercooled Stata software version 9 (Stata Corp. College Station, Texas, USA). The sensitivity, specificity, positive and negative predictive values [with 95% confidence intervals (CI), calculated using the exact binomial interval method of Clopper and Pearson ] for clinical and CD4 criteria for treatment failure were calculated, taking virological treatment failure as the gold standard. In the primary analysis, we considered the gold standard for treatment failure to be either definite or probable virological failure: in a secondary analysis the gold standard was restricted to definite virological load failure. Individuals with viral load results indicating neither adequate virological suppression (all viral loads less than 400 copies/ml) nor virological failure according to the criteria described were excluded from the analysis.
As the WHO guidelines  suggest that TB may be a poor marker for treatment failure, in a secondary analysis we evaluated a definition of clinical failure that excluded TB episodes, and we similarly evaluated a definition excluding weight loss.
The research ethics committees of the University of KwaZulu-Natal, South Africa, and the London School of Hygiene & Tropical Medicine approved this study.
Of the 676 ART-naive individuals with baseline CD4 data, 423 (63%) remained in the programme and on ART 12 months after commencing treatment. Two individuals were excluded who had switched to second-line therapy before the 12-month visit, both due to virologically defined treatment failure. A further 97 individuals were excluded because of missing 12-month viral load or CD4 data, leaving a study cohort of 324 individuals, whose baseline characteristics are shown in Table 1. Three hundred and sixteen (97.5%) were men. The median age, weight, CD4 cell count and HIV viral load at the start of treatment were 40.2 years [interquartile range (IQR) 35.7–44.7), 65 kg (IQR 60–70), 154 cells/μl (IQR 82–237) and 47 503 copies/ml (IQR 15 600–168 417)], respectively.
Treatment failure at 12 months
Five individuals had a single viral load between 401 and 1000 copies/ml at 12 months and, therefore, could not be classified as either having an adequate virological response or as having treatment failure. These individuals were excluded from further analysis, which is based on the remaining 319 classifiable individuals.
At 12 months after starting treatment, 19 (6.0%) of the cohort had definite virological failure (viral load >10 000 copies/ml), and an additional 14 (4.4%) were classified as probable virological failure (single viral load >1000 copies/ml or two viral loads >400 copies/ml less at least 30 days apart) (Table 2), giving a total of 33 using either of the two categories. There were 19 (6.0%) individuals who had treatment failure based on CD4 cell count criteria; 12 (3.8%) had a CD4 cell count at 12 months of less than that before starting ART, six (1.9%) had a CD4 cell count of less than 50% of the maximum during therapy and nine (2.8%) had a CD4 cell count of less than 100 cells/μl (Table 2).
By the 12-month visit, based on clinical criteria, 40 (12.5%) individuals were defined as having failed treatment. Eighteen (5.6%) of these experienced weight loss of greater than 10% of the maximum in the absence of an explanatory clinical condition. TB was diagnosed in 11 individuals (3.4%) and was extrapulmonary (WHO stage 4) in four of these. Among the WHO stage 3 conditions, the most common were oral candidiasis (four), bacterial pneumonia (three), oral hairy leukoplakia (two), neutropaenia (two) and one each of bacterial sinusitis and enteritis.
Sensitivity and specificity
Comparing against the gold standard of definite or probable virological failure, both CD4 and clinical criteria for treatment failure had low sensitivities (Table 3). The sensitivity of, for example, any CD4 cell count criterion at the 12-month visit was 21.2% (95% CI 9.0–38.9) and the sensitivity of any stage 3 or 4 clinical event was 15.2% (95% CI 5.1–33.9). A combination of either CD4 or clinical criteria improved sensitivity compared with either alone, to 33.3% (95% CI 18.0–51.8). The sensitivity of clinical criteria decreased still further when weight loss or TB were excluded (to sensitivities of 9.1%, 95% CI 1.9–24.3 and 12.1%, 95% CI 3.4–28.2, respectively). TB alone had the lowest sensitivity of any of the factors studied (3.0%, 95% CI 0.1–15.8).
The specificities were 95.8% (95% CI 92.8–97.8) for treatment failure based on any CD4 criteria at 12 months and 88.1% (95% CI 83.8–91.6) for clinical criteria, falling to 85.6% (95% CI 81.1–89.5) based on either CD4 or clinical criteria. Excluding TB or weight loss from the clinical criteria improved specificities slightly to 90.6% (95% CI 86.6–93.7) and 93.0% (95% CI 89.4–95.7), respectively. Use of TB alone as a failure criteria gave a specificity of 96.5% (95% CI 93.7–98.3). However, with the much larger denominator (286 individuals) used in the calculation of specificity, these results still imply that there are considerable numbers of individuals who would be incorrectly identified as treatment failures using these criteria.
The positive predictive values of CD4, or clinical criteria, or both, were low at 36.8% (95% CI 16.3–61.6), 12.8% (95% CI 4.3–27.4) and 21.2% (95% CI 11.0–34.7), respectively. Thus, using these criteria, relatively few of those diagnosed as having treatment failure will have actually failed on virological criteria. The negative predictive values were all reasonably high: 91.3% (95% CI 87.6–94.3) for overall CD4 cell count, 90.0% (95% CI 85.9–93.3) for overall clinical criteria and 91.8% (95% CI 87.8–94.8) for CD4 or clinical criteria. With the large denominators in these calculations, a considerable number of individuals classified as having adequate virological suppression would in fact have failed treatment.
A secondary analysis was carried out, restricting the gold standard to individuals with definite virological failure (results not shown); there were no substantial differences compared with the results reported above.
When CD4 cell counts at the 6-month time point were considered, 20 individuals fulfilled the CD4 criteria for treatment failure at the 6-month visit (180–240 days). Within this group, four individuals were classified as failing due to having a CD4 cell count less than 50% of the maximum CD4 cell count while on therapy, 13 due to a CD4 cell count less than that pretherapy and nine due to a CD4 cell count of less than 100 cells/μl. For these 20 individuals, 16 no longer fulfilled the CD4 criteria for treatment failure based on their CD4 cell count at 12 months, whereas the other four remained as failures at this time point. Including all individuals with CD4-defined treatment failure at 6 or 12 months gave a total of 35 individuals defined as treatment failures by immunological criteria. An analysis of the predictive ability of immunological criteria against probable or definite virological failure including this extra data resulted in a sensitivity of 21.2% (95% CI 9.0–38.9) and a specificity of 90.6% (95% CI 86.6–93.7), with positive and negative predictive values of 20.6% (95% CI 8.7–37.9) and 90.9% (95% CI 86.9–94.0), respectively.
We also assessed how many clinical events occurring after 6 months on ART could have been considered to define treatment failure, but were excluded because the event was considered by the study clinician to possibly represent IRS. This only occurred in one case, in which the individual had a diagnosis of pulmonary TB at 259 days after starting ART. Inclusion of this case would not have made a major difference to the results reported.
Our study shows that both clinical and CD4 criteria are insensitive in detecting virological treatment failure in this routine programme setting in South Africa. Perhaps more surprisingly, both also had poor specificity, and in the context of a low prevalence of virologically defined treatment failure, this led to low positive predictive values, such that the majority of individuals identified as having treatment failure by either CD4 or clinical criteria actually had adequate virological suppression. This is of concern for treatment programmes in resource-limited settings, because if these guidelines are followed and individuals are diagnosed with treatment failure on clinical or CD4 criteria, they may be switched to second-line treatment, when in reality they have suppressed viral load and no treatment switch is necessary. Second-line ART regimens usually include a protease inhibitor, which are often less convenient for the individual patient in terms of tolerability and requirement for refrigeration, and pose problems for HIV care programmes because the drugs are much more expensive, and in practice often not available. Our results support calls for more accurate algorithms to identify treatment failure in such settings .
Unexplained weight loss of at least 10% compared with the maximum recorded since starting ART was a surprisingly common event, even in this workforce population in whom food security is not a major problem: weight loss would be expected to be much more common in settings in which the food supply is less secure. Based on our findings, unexplained weight loss of 10% or more when observed in isolation should not be considered as indicative of treatment failure; similar studies in other settings would be valuable to confirm this. The second most common clinical event was, not surprisingly, TB, and this also lacked specificity as a marker of treatment failure. As there were no WHO stage 4 events other than the extrapulmonary TB in this cohort, we were not able to explore whether WHO stage 4 events were more specific markers than WHO stage 3 events.
Clinical and CD4 criteria separately lacked sensitivity in detecting virological failure, which is not surprising, since clinical experience from industrialized countries suggests that virological failure usually precedes a fall in CD4 cell count or clinical disease progression . Sensitivity was better with a combination of both clinical and CD4 criteria, but remained relatively low. These low sensitivities imply that many true treatment failures as defined by virological criteria would be misdiagnosed as having adequate levels of virological suppression.
In resource-limited settings in which treatment options for those who fail first-line therapy are limited to a single second-line regimen, even if viral load monitoring was available, it could be argued that it is not desirable to switch to the second-line regimen based on virological failure, but rather to defer a treatment switch to preserve the second-line regimen until it is essential, for example if the CD4 cell count is progressively falling. This strategy might have advantages for individual patients but could have adverse public health consequences if individuals maintained on failing regimens transmit resistant virus to others. Further research in this area is needed.
We defined ‘definite’ virological failure as greater than 10 000 copies/ml based on WHO guidelines and clinical studies [3,7], a conservative definition compared with industrialized country guidelines, because a value greater than this is unlikely to represent a ‘blip’ and because sustained viral load levels higher than this may be associated with clinical disease progression [3,7]. Ideally, a repeat blood specimen would be taken to confirm virological failure, but these were often not available in our cohort. For the main analysis, we considered virological failure to include individuals with either definite or probable virological failure, but restricting the analysis to those with ‘definite’ virological failure did not change the results, suggesting that our definition of ‘probable’ virological failure was reasonably robust.
We based our analysis on results at the 12-month visit to include a relatively large number of individuals with complete and comparable data. This analysis reflects the performance of the guidelines in assisting clinicians at the 12-month visit in deciding whether a patient has failed treatment and needs to switch to second-line treatment. We plan to examine the performance of the clinical and CD4 criteria in identifying treatment failure at later time points in future analyses.
The positive predictive value of any diagnostic test is directly related to the prevalence of the condition, hence, the positive predictive values of these criteria for treatment failure would be higher in a population where true treatment failure was more common. Just over 10% of individuals in this analysis had definite or probable virological failure at 12 months, which may appear relatively low . This may be explained by this analysis being restricted to individuals who attended a 12-month visit, had an available viral load result and had not already switched to second-line therapy. Overall, retention within the programme, and virological outcomes among individuals who remain in this ART programme are comparable with those seen in other developing country programmes [6,11,12]. Thus, our results may be generalizable to other treatment programmes in resource-limited settings.
Our results are consistent with Canadian data in which WHO CD4 cell count criteria were found to be poor predictors of virological failure by 12 months  and with a study from Botswana in which CD4 cell count increases were found to have only moderate predictive power to identify virological suppression at 6 months; CD4 cell count increases were more predictive for individuals with baseline CD4 cell counts below 100 cells/μl . They are consistent with data from a retrospective cohort of patients in Thailand , which assessed CD4 and clinical criteria for failure defined by Thai national guidelines compared against a gold standard of HIV viral load greater than 50 copies/ml. In this study, the reported sensitivity and specificity for immunological criteria (absence of an increase, or a decrease of 30% from maximum after at least 6 months of ART) were 13.3 and 89.6%, respectively, and for clinical criteria (a new AIDS-associated condition or death after at least 6 months on ART; no details were given on the aetiology of clinical events) were 10.0 and 95.6%, respectively.
In conclusion, our data suggest that WHO criteria for ART failure based on CD4 and clinical criteria perform poorly in identifying virological failure. Given the low positive predictive values, we recommend that individuals suspected to have treatment failure using these criteria undergo further evaluation, preferably with viral load estimation, to confirm virological failure before switching to second-line treatment. This strategy would have cost implications, but the laboratory cost of viral load estimations would likely be offset by large savings in drug costs if individuals are not switched unnecessarily to expensive second-line ART regimens. Further work is needed to develop better algorithms to guide clinicians concerning switches to second-line ART when laboratory facilities are limited.
We thank all the staff of Anglogold Health Services and Aurum Institute for their work in collecting the data used in this study and assistance in its preparation, in particular Mr Michael Eisenstein and Dr Lindiwe Pemba. We are grateful to Dr Jenny Whetham and Dr Mampedi Bogoshi for providing data on outpatient TB episodes and hospital admissions.
Author's contributions: P.M. was responsible for study design, analysis and interpretation, drafting the manuscript; K.F. for study design, data collection, analysis and interpretation, revising the manuscript; S.C. for study design, data collection, revising the manuscript; G.J.C. for study design, data collection, revising the manuscript and A.D.G. study concept and design; data analysis and interpretation; revising the manuscript.
1. World Health Organisation. World Health Organisation treating 3 million by 2005: making it happen: the WHO strategy: the WHO and UNAIDS global initiative to provide antiretroviral therapy to 3 million people with HIV/AIDS in developing countries by the end of 2005/Treat 3 Million by 2005. http://libdoc.who.int/publications/2003/9241591129.pdf
. [Accessed 2 June 2008].
4. Gilks CF, Crowley S, Ekpini R, Gove S, Perriens J, Souteyrand Y, et al
. The WHO public-health approach to antiretroviral treatment against HIV in resource-limited settings. Lancet 2006; 368:505–510.
5. Charalambous S, Grant AD, Day JH, Rothwell E, Chaisson RE, Hayes RJ, Churchyard GJ. Feasibility and acceptability of a specialist clinical service for HIV-infected mineworkers in South Africa. AIDS Care 2004; 16:47–56.
6. Charalambous S, Grant AD, Day JH, Pemba L, Chaisson RE, Kruger P, et al
. Establishing a workplace antiretroviral therapy programme in South Africa. AIDS Care 2007; 19:34–41.
7. Murri R, Lepri AC, Cicconi P, Poggio A, Arlotti M, Tositti G, et al
. Is moderate HIV viremia associated with a higher risk of clinical progression in HIV-infected people treated with highly active antiretroviral therapy: evidence from the Italian cohort of antiretroviral-naive patients study. J Acquir Immune Defic Syndr 2006; 41:23–30.
8. Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 1934; 26:404–413.
9. Colebunders R, Moses KR, Laurence J, Shihab HM, Semitala F, Lutwama F, et al
. A new model to monitor the virological efficacy of antiretroviral treatment in resource-poor countries. Lancet Infect Dis 2006; 6:53–59.
10. Deeks SG, Barbour JD, Grant RM, Martin JN. Duration and predictors of CD4 T-cell gains in patients who continue combination therapy despite detectable plasma viremia. AIDS 2002; 16:201–207.
11. Akileswaran C, Lurie MN, Flanigan TP, Mayer KH. Lessons learned from use of highly active antiretroviral therapy in Africa. Clin Infect Dis 2005; 41:376–385.
12. Rosen S, Fox MP, Gill CJ. Patient retention in antiretroviral therapy programs in sub-Saharan Africa: a systematic review. PLoS Med 2007; 4:e298.
13. Moore DM, Mermin J, Awor A, Yip B, Hogg RS, Montaner JS. Performance of immunologic responses in predicting viral load suppression: implications for monitoring patients in resource-limited settings. J Acquir Immune Defic Syndr 2006; 43:436–439.
14. Bisson GP, Gross R, Strom JB, Rollins C, Bellamy S, Weinstein R, et al
. Diagnostic accuracy of CD4 cell count increase for virologic response after initiating highly active antiretroviral therapy. AIDS 2006; 20:1613–1619.
15. Chaiwarith R, Wachirakaphan C, Kotarathititum W, Praparatanaphan J, Sirisanthana T, Supparatpinyo K. Sensitivity and specificity of using CD4+ measurement and clinical evaluation to determine antiretroviral treatment failure in Thailand. Int J Infect Dis 2007; 11:413–416.