Randomized controlled trials are recognized as the gold standard for directly comparing different treatment strategies, in that they compare groups of patients who are essentially similar in all characteristics (known and unknown) other than treatment received. To maintain this fundamental comparability, it is recognized that all randomized patients should be included in the analysis; the intention-to-treat principle. However, even when this principle is adopted, the ways in which endpoints are defined, patients are followed up and analyses are conducted can have a substantial impact on trial results. These issues are rarely explicitly discussed, either in trial reports or in treatment guidelines, or even in textbooks. An appreciation of these issues is essential for understanding the likely impact of transferring trial strategies into clinical practice.
An area where definition of endpoints has undergone considerable change is that of HIV infection, and the choice of which regimen to initiate antiretroviral therapy with is an important clinical decision that we will use to illustrate these issues. More than other decisions in HIV clinical practice, this choice of first line therapy largely relies on results from trials comparing the virologic outcomes in groups randomized to start with different regimens. Such trials are few in number and, for some relevant comparisons (e.g., lopinavir/r versus efavirenz against a common background), no substantive trials are currently available. This then makes it all the more important that results from individual trials have been generated as uniformly as possible, or at least that differences are understood, as such comparisons can only be performed indirectly via common comparator regimens in other trials. We compare the properties of two intention-to-treat virologic-based endpoints for these trials and explain why we believe that the choice of endpoint, follow up strategy and analysis of many key trials is such that they cannot provide the full information required for applying the results in clinical practice. Many of the issues we raise have been highlighted by others [1–4] but appear not to have been widely appreciated.
Virologic failure as an endpoint
We first consider virologic failure as an endpoint. There is wide consensus that – in order to minimize the risk of development of resistance mutations – the aim of initial antiretroviral therapy should be to achieve viral load below 50 HIV RNA copies/ml at most within 24 weeks, and that viral load should be maintained at or close to this level [4,5]. In those who have achieved this suppression there is lack of agreement about precisely what level of subsequent increase constitutes ‘virologic failure'. This is an issue both for trial endpoints and for clinicians in routine practice deciding whether and/or when to change drugs. Some define virologic failure by two repeated values above 50 HIV RNA copies/ml [6,7] while others use 200 HIV RNA copies/ml  or 400 HIV RNA copies/ml , and this has been driven partly by the lower limit of quantification of different viral load assays. The reason for the uncertainty concerning what constitutes virologic failure is that viral load naturally fluctuates and the viral load assays have some measurement imprecision associated [9–11]; repeated values above 50 HIV RNA copies/ml are commonly followed by values below 50 HIV RNA copies/ml despite no change in regimen, for example [10,12]. Of note, few randomized trials have assessed the impact of different virologic failure criteria on subsequent outcome, and therefore the impact of different definitions of this endpoint on long-term treatment success are unknown. Further, although not considered here, pre-eminence of virologic failure as an endpoint is based on biological mechanisms of viral infectious diseases, and other failure endpoints, such as immunological (CD4 cell count) failure, have rarely been considered.
However defined, whether or not virologic failure has occurred by a given point in time represents a reasonable choice for the primary virologic-based endpoint for trials, although it has not been used widely. The most commonly quoted reason for not using it is that in some individuals the original regimen may have been switched, due to toxicity or convenience, without virologic failure having occurred. This is known to be a frequent occurrence in routine clinical practice [13,14] where patients often take several months to settle on a convenient regimen that suits them without any problematic toxicities. Where occurrence of virologic failure is the endpoint, then such switching is ignored for the purposes of ascertaining the endpoint, and follow up continues on the new regimen(s) to ascertain whether virologic failure occurs subsequently (e.g., in ). The precise definition of the endpoint is whether virologic failure has occurred to any regimen used since starting therapy, not necessarily the initial one. Estimates from routine clinical practice suggest virologic failure rates of around 20% by 2 years for people initiating therapy with three or more antiretroviral drugs, much lower than the proportion who have switched any drug in their regimens by this time [13,14,16,17].
This last issue is judged by some to be a disadvantage of virologic-based endpoints which ignore switches to the original regimen, due to concern that the comparison between the original regimens could have been distorted by allowing use of other regimens which might influence the endpoint. Such concerns bring up fundamental issues about what questions randomized trials are actually able to address. In general, trials can only answer questions in an unbiased fashion about the strategy of starting with one regimen versus the strategy of starting with another (where the strategy might include guidance as to what to switch to for toxicity or convenience), as it is only the groups of randomized patients at the start of a trial that are essentially similar. Anything that happens to individual patients after the start of the trial (such as treatment switches) occurs both because of the kind of patient they are and the specific treatment they have received, and therefore patients who switch treatments in different randomized groups are unlikely to be comparable. This estimate of the treatment effect is commonly called the effectiveness of a treatment strategy; i.e., what improvement would likely be observed in clinical practice if all patients started therapy with one treatment regimen rather than the other? Generalizing such effectiveness estimates into clinical practice clearly assumes that similar changes from randomized treatment regimen would be likely to occur inside and outside the clinical trial.
In contrast, the efficacy of a treatment strategy is the treatment effect that would have been observed if no one had had to switch from randomized treatment . Unfortunately, simple approaches to estimating efficacy (often referred to as ‘on treatment’ analyses), such as restricting comparisons to patients who have not changed from their randomized treatment, are usually biased because patients who switch treatment are not representative of either randomized group and switching is also unlikely to occur at random. For example, in the Concorde trial of immediate versus deferred zidovudine stopping blinded trial drug and starting open label zidovudine was a far more important predictor of disease progression than treatment received : notably, the hazard of ARC/AIDS/death was 1.6 times higher for participants in the deferred group who had received zidovudine compared to those who had not, even after adjusting for more recent CD4 cell counts. Thus an ‘on treatment analysis', excluding patients who stopped blinded trial drug, would preferentially remove patients with poorer prognosis in the deferred group from any treatment comparison, diluting any treatment difference. Although instinctively ‘on treatment’ analysis feels as if it should be more representative of clinical practice, subtle disparities between excluded patients can totally distort estimates of treatment effect.
Trials can only produce unbiased estimates of the efficacy of two original regimens in isolation if essentially all people stay on those regimens for long enough. In a situation, such as we have, where switching of drugs due to toxicity and convenience is common in routine practice and trials alike, we can only hope to answer the question about strategy—that is, effectiveness. Indeed, this is the only question relevant to clinical practice, where patients will generally not remain on drugs causing them significant toxicity or inconvenience, and the answer to the question ‘which regimen is better if all patients are forced to take it?’ is not much use. Further, at the point at which a patient decides to initiate therapy a clinician is unlikely to be able to predict what will happen to this specific patient in the future in terms of this regimen (or they wouldn't start them on this regimen in the first place), and so it is only patients as randomized, with whom such a new patient is directly comparable, that can provide the best information about future response.
However, this doesn't mean that treatment switches should be ignored completely—particular attention should always be paid to which regimens were actually used during follow up in the two arms when interpreting trial results. A difference, or lack of it, in the proportion with virologic failure could be strongly influenced by the regimens that are used subsequent to the original ones and this must be considered when deciding whether to adopt a strategy in clinical practice. In trial design, it is also helpful if recommendations on which new drugs should be used in people switching form part of the strategy that is being evaluated. It should be noted that these issues are not just relevant for the virologic failure endpoint, but for any endpoint which involves ignoring switches from the original regimen.
Regimen failure as an endpoint
A large proportion of substantive trials in naive patient populations are performed primarily for the purposes of obtaining licensing approval for a drug. Authorities responsible for granting such licences ask that analyses using certain endpoint definitions are included in submissions for drug approval. For example, there is a requirement by the Food and Drugs Administration (FDA) that submissions include analyses of regimen failure (termed ‘loss of virologic response’ in the FDA guidance) by 48 weeks . According to the FDA definition, virologic failure is only one cause of regimen failure. Introduction of any new antiretroviral drug (except for switches in the background drugs in the regimen due to toxicity or intolerance) is also considered as regimen failure. Thus, if the drug of interest in the original regimen is switched due to toxicity or convenience, despite virologic failure not having occurred, then this is considered as regimen failure in the same way as virologic failure is considered regimen failure. Since licensing authorities do not currently generally require analyses of endpoints that use information from people who have switched to a new drug, such information (including laboratory tests like viral load and CD4 cell counts) is often not even collected, or not analysed and reported. Thus, it is relatively rare for analyses using the virologic failure endpoint described above, or any other endpoint that uses data on people who have switched drugs, to be reported from trials which were designed for licensing purposes.
This regimen failure endpoint is widely referred to as an intention to treat: missing = failure or discontinuation = failure endpoint. It is missing = failure or discontinuation = failure, because people who have discontinued drugs in their original regimen, or not been followed up because they have discontinued drugs in their original regimen (and therefore have missing data) are counted as failures. As we pointed out, switching of drug regimens due to toxicity and convenience tends to be more common than virologic failure. This means that in most trials, the proportion of regimen failures that is due to virologic failure tends to be below 50% and is often considerably lower. For example, in CNAB 3005 comparing abacavir with indinavir virologic failures represented less than 20% of all regimen failures , in ACTG 384 it was around 30%  and in 2NN it was around 40–50% . This means that this endpoint tends to reflect the toxicity and convenience of the starting regimen rather more than the risk of virologic failure associated with the strategy of starting therapy with that regimen.
Although trial protocols lay down guidance criteria for levels of toxicity that are sufficient to lead to drug switching, there is inevitably a strongly subjective element, with clinicians and patients differing in their threshold for switching for a given level of toxicity or of inconvenience, particularly when clinical practice suggests toxicity may be transient. This is a negative feature of an endpoint, especially in a situation where a trial is not blinded through the use of placebos, or even where it is blinded, but there are clear differences in toxicity profile that compromises the blinding. This is because there is potential for bias to be introduced through selective switching in one group due to increased perception of toxicity impact or convenience.
Another criticism of the regimen failure endpoint is that it treats virologic failure as equivalent to stopping due to toxicity or other reasons, yet the clinical implications are quite different. Stopping due to virologic failure, with the increased likelihood of having developed resistance mutations, is not the same as switching drugs in a person with viral load < 50 HIV RNA copies/ml to make the regimen more convenient or reduce toxicity. In the former case, there are implications for the future efficacy of other drugs in the classes that have failed; in the latter there are not, although drugs with a similar toxicity profile to the one that has been discontinued may be ruled out as future options.
Lastly, but perhaps most importantly, virologic endpoints were introduced as a shorter-term replacement for the long-term endpoints of AIDS or death. This was based on evidence that differences between trial arms in viral load outcomes were generally (although by no means completely) consistent with differences in clinical outcomes in the same trial . In this assessment of surrogacy, the viral load outcome did not treat switching of the original drug regimen as failure. Neither, of course, did the trials with clinical endpoints, such as ACTG 175, Delta or ACTG 320 [22–24] treat switching drugs as being equivalent to failing (i.e., developing AIDS). So, use of a switching = failure criteria loses sight of the primary reason for using virologic-based endpoints.
Table 1 lists some recently reported trials in anitiretroviral naive people and indicates whether analyses which include post-switch data are included, along with analyses of the regimen failure endpoint, which all use. Of the nine trials listed, only two have, thus far at least, reported such analyses.
A different answer in a different setting
The subjective element to the regimen failure endpoint leaves open the possibility that the results of a trial might be different depending on the time and place in which it is conducted. Consider a situation in which we have two regimens, A and B, which have different types of toxicity profile—perhaps one raises lipids and the other causes rash and hepatic toxicity, or maybe one is associated with central nervous system toxicity. Standardized criteria for drug switching which are uniformly adhered to are impractical, so the decision whether to switch the drug of primary interest in a given regimen if such toxicity occurs partially depends on the clinician's perception of the importance of that toxicity. Although differences might not be large, at certain points in time, or in certain clinics, or certain regions of the world, there may be somewhat more sensitivity over the toxicities of regimen A than B, and at other times and places the reverse might be true. For example, there may be less concern over lipid rises in countries with low coronary heart disease incidence; or there may be a tendency to switch the drug at times when deaths relating to a certain toxicity have recently been highlighted in presentations or publications.
Let us consider a trial which has taken place in these two sets of circumstances (Table 2), and where regimen A truly has lower virologic efficacy than regimen B, reflected in the fact that there are 35 virologic failures on A and only 20 on B among the 200 patients in each arm (35/200 versus 20/200; P = 0.03 chi-square test). For simplicity, we refer to whether a regimen has been switched, but recognize that there is often a distinction between whether switches are in the comparator drug of primary interest or background drugs when switching due to toxicity is sometimes not considered as regimen failure. First, consider a situation in which there is more sensitivity to the toxicities of regimen A (top section of Table 2). As a consequence, more people stopped regimen A than B—65 versus 50. So there are 65 switches due to toxicity/convenience plus 35 virologic failures on A giving 100 regimen failures in all, leaving 100 (50%) without regimen failure by 48 weeks. Although somewhat misleading, this figure is usually expressed as the percentage with viral load < 50 HIV RNA copies/ml intention to treat male = female. This figure compares with 65% without regimen failure on B, so the overall result clearly favours B.
Suppose, on the other hand, the situation is exactly the same, with B having fewer virologic failures than A, but that now there is more sensitivity to toxicities of regimen B instead of A—so 65 switches on regimen B and only 50 on regimen A (bottom section of Table 2). Here there are a total of 85 failures on each arm, and thus there is no difference between the two groups in this endpoint, despite the fact that regimen B is actually associated with fewer virologic failures than regimen A.
This illustrates how the relative degree of sensitivity to one type of toxicity compared with another in a trial can swing the result one way or the other when composite endpoints such as regimen failure are used. This is a disturbing property of an endpoint, especially when it is generally interpreted by trial investigators and the medical community as a measure of virologic efficacy.
It is worth noting that the ‘on treatment’ analysis commonly used, which excludes those who have switched drugs, is also sensitive to the patterns of treatment switches and will often not pick up differences in virologic failure rates between regimens either. This is because it excludes patients who switch regimens in both treatment groups, and as described earlier, these patients are neither representative of these groups, nor comparable with each other.
It should be noted that there are alternative virologic endpoints to that of virologic failure, even though virologic failure has several advantages. The disadvantages of virologic failure as an endpoint include the fact that viral load changes after the first virologic failure are not accounted for, nor is any distinction made between virologic failure associated with documented evidence of resistance mutations and that which is associated with no resistance mutations. A new bivariate failure time endpoint recently proposed combines the regimen failure and virologic failure endpoints . Other relevant endpoints in trials of antiretrovirals include CD4 cell count, quality of life, development of AIDS defining diseases and therapy adherence. In general, we feel that these are best treated as separate endpoints, rather than attempts made to combine into a composite endpoint, as such an approach means results are difficult to interpret.
Further, here we have concentrated on virologic endpoints but the principle of continuing follow up despite switches in therapy is one that is important for properly determining other endpoints, such as CD4 cell count changes or incidence of clinical events. In addition, whilst we have used trials carried out in naive individuals as an example, these issues are relevant to any trials in any populations in which viral load suppression to below a specified limit is a common outcome.
Besides the choice of endpoints that we have highlighted, another issue which limits the usefulness of results from trials is their relatively short duration. Again, the 48-week standard length is dictated by regulatory authorities. Results from extended follow up over 2 or more years are only occasionally presented, although even then rates of loss to follow up can be of appreciable magnitude, limiting their usefulness.
Many trials in antiretoviral naive individuals are designed to meet criteria laid down by regulatory authorities for trials used in support of drug licensing. The focus of such designs is on attempting to isolate the effect of a specific drug, on the assumption that the full effects of having used a drug are realised while the drug is being taken, or within a few months of stopping. However, the results obtained are of less benefit for clinical practice than they would be if follow up of viral load and other outcomes in participants were continued and analysed regardless of whether they remain on the original trial regimen . We suggest that those designing future trials strongly consider maintaining patient follow up in this way, and interpret differences in virologic failure in the light of treatment switches rather than, or in addition to, directly including these switches in regimen failure endpoints.
We received helpful comments from C. Sabin, A. Cozzi Lepri, A. Mocroft, Z. Fox and Jens Lundgren.
Sponsorship: This manuscript arose from discussions within an MRC Co-operative Group (Grant G0000130).
1. Gilbert PB, DeGruttola V, Hammer SM, Kuritzkes DR. Virologic and regimen termination surrogate endpoints in AIDS clinical trials.JAMA
2. Kirk O, Pedersen C, Law M, Gulick RM, Moyle G, Montaner J, et al.Analysis of virological efficacy in trials of antiretroviral regimens: drawbacks of not including viral load measurements after premature discontinuation of therapy.Antiviral Ther
3. DiRienzo AG, DeGruttola V. Design and analysis of clinical trials with a bivariate failure time endpoint, with application to AIDS Clinical Trials Group Study A5142.Control Clin Trials
4. Department of Health and Human Services. Guidelines for the use of antiretroviral agents in HIV-infected adults and adolescents.http://www.aidsinfo.nih.gov
5. BHIVA Writing Committee on behalf of the BHIVA Executive Committee. British HIV Association guidelines for the treatment of HIV-infected adults with antiretroviral therapy.HIV Med
6. van Leth F, Hassink E, Phanuphak P, Miller S, Gazzard B, Cahn P, et al
. for the 2NN study group. Results of the 2NN Study: A randomized comparative trial of first-line antiretroviral therapy with regimens containing either nevirapine alone, efavirenz alone or both drugs combined, together with stavudine and lamivudine.Tenth Conference on Retroviruses and Opportunistic Infections.
Boston, February 2003 [abstract 176].
7. U.S. Department of Health and Human Services, Food and Drug Administration. Center for Drug Evaluation and Research (CDER). Guidance for Industry. Antiretroviral Drugs Using Plasma HIV RNA Measurements — Clinical Considerations for Accelerated and Traditional Approval. Oct 2002. http://www.fda.gov/cder/guidance/index.htm
8. Robbins G, Shafer R, Smeaton L, De Gruttola V, Pettinelli C, Snyder S, et al
. Antiretroviral strategies in naïve HIV+ subjects: comparison of sequential 3-drug regimens (ACTG 384).XIV International Conference on AIDS.
Barcelona, July 2002 [abstract LbOr20A].
9. Mazen Y, Pozniak AL, Pillay D, Mandalia S, Wildfire A, Gazzard BG. Evidence for low-level viral replication (<50 copies/mL) predicts eventual virological failure.Eighth Annual Conference of the British HIV Association
. York, UK, April 2002 [abstract 011].
10. Greub G, Cozzi-Lepri A, Ledergerber B, Staszewski S, Perrin L, Miller V, et al.Intermittent and sustained low-level HIV viral rebound in patients receiving potent antiretroviral therapy.AIDS
11. Moore AL, Youle M, Lipman M, Cozzi-Lepri A, Lampe F, Madge S, et al.Raised viral load in patients with viral suppression on highly active antiretroviral therapy: transient increase or treatment failure?AIDS
12. Staszewski S, Sabin C, Dauer B, Cozzi Lepri A, Phillips AN. Definition of loss of virologic response in trials of antiretroviral drugs.AIDS
13. d'Arminio Monforte A, Lepri AC, Rezza G, Pezzotti P, Antinori A, Phillips AN, et al.Insights into the reasons for discontinuation of the first highly active antiretroviral therapy (HAART) regimen in a cohort of antiretroviral naïve patients. ICONA Study Group.AIDS
14. Mocroft A, Youle M, Moore A, Sabin CA, Madge S, Lepri AC, et al.Reasons for modification and discontinuation of antiretrovirals: results from a single treatment centre.AIDS
15. Dragsted U, Gerstoft J, Pedersen C, Peters B, Duran A, Obel N, et al.
for the MaxCmin1 trial group. Randomized trial to evaluate indinavir/ritonavir versus saquinavir/ritonavir in human immunodeficiency virus type 1-infected patients: The MaxCmin1 trial.J Infect Dis188
16. Ledergerber B, Egger M, Opravil M, Telenti A, Hirschel B, Battegay M, et al.Clinical progression and virological failure on highly active antiretroviral therapy in HIV-1 patients: a prospective cohort study.Lancet
17. Staszewski S, Miller V, Sabin CA, Carlebach A, Berger A-M, Weidemann E, et al.Virological response to protease inhibitor therapy in an HIV clinic cohort.AIDS
18. Schwartz D, Lellouch J. Explanatory and pragmatic attitudes in therapeutic trials.J Chron Dis
19. White IR, Walker AS, Babiker AG, Darbyshire JH. Impact of treatment changes on the interpretation of the Concorde trial.AIDS
20. Staszewski S, Keiser P, Montaner J, Raffi F, Gathe J, Brotas V, et al.Abacavir-lamivudine-zidovudine vs indinavir-lamivudine- zidovudine in antiretroviral-naive HIV-infected adults: A randomized equivalence trial.JAMA
21. HIV Surrogate Marker Collaborative Group. Human Immunodeficiency virus type 1 RNA level and CD4 count as prognostic markers and surrogate endpoints: a meta-analysis.AIDS Res Hum Retroviruses
22. Hammer SM, Katzenstein DA, Hughes MD, Gundacker H, Schooley RT, Haubrich RH, et al
. A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. AIDS Clinical Trials Group Study 175 Study Team.N Engl J Med
23. Hammer SM, Squires KE, Hughes MD, Grimes JM, Demeter LM, Currier JS, et al
. A controlled trial of two nucleoside analogues plus indinavir in persons with human immunodeficiency virus infection and CD4 cell counts of 200 per cubic millimeter or less. AIDS Clinical Trials Group 320 Study Team.N Engl J Med
24. Delta Co-ordinating Committee. Delta: A randomised double-blind controlled trial comparing combinations of zidovudine plus didanosine or zalcitabine with zidovudine alone in HIV-infected individuals.Lancet
25. K Squires, Thiry A, Giordano M et al.Atazanavir (ATV) qd and efavirenz (EFV) qd with fixed-dose ZDV+3TC: comparison of antiviral efficacy and safety through wk 24 (AI424-034).42nd Interscience Conference on Antimicrobial Agents and Chemotherapy.
San Diego, September, 2002 [abstract 1076].
26. Vibhagool A, Cahn P, Schechter M, Soto-Ramirez L, Montroni M, Smaill F, et al.Abacavir/Combivir (ABC/COM) is comparable to indinavir/Combivir (IDV/COM) in HIV-1-infected antiretroviral therapy naïve adults: Preliminary results of a 48-week open-label study (CNA 3014).First IAS Conference on HIV Pathogenesis and Treatment.
Buenos Aires, July 2001 [abstract 63].
27. Walmsley S, Bernstein B, King M, Arribas J, Beall G, Ruane P, et al.Lopinavir-ritonavir versus nelfinavir for the initial treatment of HIV infection.N Engl J Med
28. Staszewski S, Morales-Ramirez J, Tashima KT, Rachlis A, Skiest D, Stanford J, et al
. Efavirenz plus zidovudine and lamivudine, efavirenz plus indinavir, and indinavir plus zidovudine and lamivudine in the treatment of HIV-1 infection in adults. Study 006 Team.N Engl J Med
29. Schurmann D, Gathe J, Sanne I, Wood R. Efficacy and safety of GW433908/ritonavir once daily in therapy-naive subjects, 48 weeks results: The SOLO study.Sixth International Congress on Drug Therapy in HIV Infection
. Glasgow, November 2002 [abstract PL14.4].