Because active antiretroviral therapy (HAART) cannot eradicate HIV-1 and has many side effects, the initial “hit early and hard” strategy has been modified. Considerations for initiation of HAART nowadays are based on how much the immune system can recover afterward, what side effects can be expected, and what will happen if HAART is withheld. Markers and their relation to disease progression in HAART-naive individuals provide important information for answering the last question. Initiation of HAART has been recommended for persons with asymptomatic HIV infection in whom the estimated probability of developing AIDS within 3 years is more than 30% (^{1,2}). This was based on a study that computed Kaplan-Meier survival curves, which were stratified by categories of plasma HIV-1 RNA and CD4 lymphocyte count measured “at baseline” (^{3}), with baseline defined as “1 to 1.5 years after enrollment.” Because time since seroconversion instead of time since enrollment forms the principal time scale, that study made the assumption that predicted survival did not depend on time since seroconversion (i.e., that all information on disease progression was captured by the markers included in the model). The purpose of the study presented here was to extend this model in several ways. The baseline time point was determined via time since seroconversion rather than time since enrollment. Data at different baseline time points were combined and analyzed together. This increases power and allows observation of covariate effects that change over time since seroconversion. Furthermore, we included other markers and cofactors that have been shown to influence time to AIDS and death.

From the fitted model, an individual survival curve of residual time to AIDS or death is easily predicted for any combination of measured marker and cofactor values. Some examples are given in this article.

Another aim was to quantify, via accurate indices of predictive value, how well the markers and cofactors predict at the individual level and which combinations perform best in this respect. The predictive value of markers with respect to AIDS and death is usually assessed by estimation of RRs and computation of *P* values (^{3–6}). Nevertheless, *P* values tell only how much the data support the null hypothesis that the covariables under investigation have no effect; for prediction, it is important to quantify how much an individual's predicted outcome deviates from his actual outcome (^{7,8}). We used two measures of predictive value at the individual level. The role of the extra covariables, in addition to CD4 lymphocyte count and HIV-1 RNA load, in increasing the predictive value was investigated.

#### DATA AND METHODS

##### Data

Data were obtained from the Amsterdam Cohort Study (ACS) among homosexual men (^{9,10}). In the ACS, HIV-positive persons are seen once every 3 months and blood samples are taken at each visit. Data from follow-up in hospitals are included. The study was approved by the medical ethical committee, and informed consent was obtained from all participants. Included in our study were persons who either seroconverted for HIV-1 during ACS follow-up or entered the ACS between October 1984 and April 1985 already infected. For the latter group, we estimated the date of seroconversion via conditional mean imputation, based on a cohort-specific seroconversion distribution (^{10,11}). July 1996 was taken as the cutoff date of our analysis, because HAART became widely available after that time and greatly influenced survival prognosis.

Apart from CD4 lymphocyte count (notation as variable in model: cd4; unit counts per microliter) and serum HIV-1 RNA load (notation: load; unit copies per milliliter), we included syncytium-inducing (SI) HIV-1 phenotype (notation: si; SI and non-SI or NSI values) as a measure of HIV virulence (^{12}). We also used patient age (^{13}) and two genetic cofactors shown to influence AIDS progression (^{14,15}): chemokine receptor alterations CCR5-Δ32 (^{16}) (notation: ccr5; wild-type and heterozygous values) and CCR2b-64I (^{17}) (notation: ccr2; wild-type and heterozygous/homozygous values for the CCR2b-64I allele). Calendar period was included to correct for the effects of the increasing availability of treatment before the HAART era.

At each year from 1 to 8 years after seroconversion, the median individual CD4 count and HIV-1 RNA load were computed over the previous year. For HIV-1 phenotype, we first imputed part of the missing values. Because persons who test positive for SI variants usually continue to be positive for such variants at subsequent test points (^{18}), a missing value was considered to be SI if the individual had a previous SI test result. Likewise, it was assumed to be NSI if the individual had a subsequent NSI test result. If the time span between an NSI value and a subsequent SI value was less than 2 years, the date of switch to SI was estimated via conditional mean imputation based on the cumulative incidence curve of the time from HIV-1 seroconversion to the switch to SI. After the imputation, the last known value before the end of each year after seroconversion was recorded. Calendar period of measurement was split into three categories (before May 1988, from May 1988 to July 1991, and after July 1991), which correspond to changes in method of measurement of CD4 count. They also more or less correspond with the start of zidovudine (AZT) monotherapy (April 1987) and combined zidovudine and zalcitabine (ddC) bitherapy (September 1991) in our cohort.

If a person had all variables, except viral load, measured in a specific year, this record was included in the data set. Hence, persons may contribute up to eight records. In total, 280 persons were included in the analyses, with a total of 1401 complete (i.e., all variables measured) records; 120 persons were seroconverters with a seroconversion interval less than 2 years. The analyses that included load were based on smaller numbers (957 complete records from 275 persons). A summary of the data for each year since seroconversion is given (Table 1).

##### Laboratory Methods

All CD4 lymphocyte counts were obtained prospectively. Before May 1988, we used single indirect immunofluorescence staining on Ficoll-isolated peripheral blood mononuclear cells (PBMCs). This was replaced by a double direct staining thereafter. The flow cytometer was replaced by a FASCAN in 1991. Each day, samples were compared with values from healthy HIV-negative controls. HIV-1 RNA was quantified by use of a nucleic acid sequence–based amplification technique (NASBA HIV-1 RNA QT; Organon Teknika, Boxtel, The Netherlands), with a quantification threshold of 1000 copies/mL. All HIV-1 RNA values were obtained retrospectively from serum at least once a year for the seroconverters and less often and more irregularly for the seroprevalent cases. To determine HIV-1 phenotype, HIV-1 was isolated from fresh or cryopreserved PBMCs that had been obtained from participants and cocultivated with MT-2 lymphoblastoid cells (^{19}). Isolates producing syncytia in MT-2 cells were considered to be SI. The detection of SI variants in ACS participants has been carried out routinely since July 1992 (^{18}). For persons who were SI at that time, back testing from earlier points in time was performed.

##### Statistical Methods

The basic approach of the analysis is similar to the one used by de Wolf et al. (^{4}): for each year after seroconversion, the effect of the median/last marker value over the previous year (the “baseline” value) on the residual time to AIDS or death was analyzed in a Cox model. The difference is that all (eight) baseline data sets were merged and analyzed in one model, allowing for more powerful analyses and the inclusion of the effect of time since seroconversion (which is represented by the baseline year) as a covariate.

The baseline years were included as a linear effect, allowing for a change in slope halfway at year 4. They model the effect of time since seroconversion and take account of the amount of disease progression that is not captured by the markers. The fourth root of cd4 and the logarithm of load were included as linear effects. Load values below the cutoff were set at 1000 copies/mL. We assume that any value below 1000 copies/mL has the same effect on time to AIDS and death as the value of 1000 copies/mL itself. (The extra binary covariable “undetectable viral load” was not significant at the 0.05 level.) Because a change from NSI to SI phenotype is known to accelerate CD4 decline (^{12}), and because only a little further decline is possible at a low CD4 count, we modeled an interaction effect between cd4 and si. We included an interaction between ccr5 and si (^{20}) and between cd4 and load (^{5}). We allowed for an interaction effect between calendar period and cd4, because until 1996, the decision to start treatment was based on CD4 count to a large extent. An interaction between baseline year and both load and cd4 was modeled to see whether the effect of both markers on the risk of AIDS development changes over time after seroconversion. A robust estimator of the SE of the RRs was used (^{21,22}) to correct for the fact that a person may appear up to eight times in the data set. The distribution of residual time to AIDS was obtained by including the estimate of the cumulative baseline hazard.

We considered two different end points: the presence of AIDS according to the European 1993 AIDS case definition (^{23}), which excludes cases based solely on a CD4 count below 200 cells/μL, and the occurrence of death after AIDS diagnosis. Because residual time to death was modeled for persons who were free of AIDS-defining conditions, baseline measurements after AIDS diagnosis were excluded. In total, four “full” models were analyzed: model I.A and I.D investigate time to AIDS and death for the full set of covariables, and models II.A and II.D do the same after exclusion of load, using the larger data set. In addition, we investigated the predictive value of subsets of markers and cofactors by fitting a model that included only these covariables.

Two indices of predictive accuracy were used: Somers rank correlation, *R*, and explained variation, *EV.* The latter compares, for each individual, his predicted probability of developing AIDS within 3 years (or of dying within 5 years) with the information as to whether that person actually experienced the event before that time (a 0-1 variable). If *D*_{x} is this average deviation between predicted and observed outcome in a model with covariables (and *D* without covariables), *EV* is defined as (*D*_{x} −*D*) / *D* (^{24}). *EV* is related to *R*^{2} in linear models. The other index, Somers rank correlation, *R,* measures the average concordance between predicted and observed outcome for each pair of individuals; if the person with the higher predicted survival also survives longer, results are concordant. Formally, it is defined as *R* = 2(*c* − 0.5), in which *c* is the estimated probability of concordance. For binary outcomes, *c* is equal to the area under the receiver operating characteristic [ROC] curve (^{25}). In a model with high predictive value, *EV* and *R* are close to 1, whereas they are 0 in a model without any predictive value. The indices of predictive accuracy were validated by a clustered bootstrap procedure (^{26}), which corrects for overoptimism in predictive value caused by overfitting.

All analyses were performed with S-Plus (Mathsoft, Data Analysis Products Division, Seattle, WA, U.S.A.) (^{27}) together with the S-Plus Design library (^{28}).

#### RESULTS

##### Relative Risks and Predicted Time to AIDS and Death

For any combination of a person's marker and cofactor values, a curve representing his distribution of residual time to AIDS and death can be obtained. Curves of residual time to AIDS are given for some interesting combinations of marker values based on the model that excluded load (Fig. 1).

For example, from the upper graph, we conclude that a person aged 38 years who is of wild type for all genetic markers and has 200 CD4 cells/μL and an SI phenotype measured 4 years after seroconversion and between 1991 and 1996 has about a 35% probability of remaining AIDS-free for more than 2 years. If this person had an NSI instead of SI phenotype, this probability would be 62%. An interaction effect between cd4 and si is clearly visible: for high CD4 counts, the values NSI and SI yield quite different curves, but this difference vanishes at low CD4 counts. The middle graph depicts the effect of years since seroconversion. At a low CD4 count, residual time to AIDS is seen to be longer in the first 2 years after seroconversion. In the lower graph, we see an effect of calendar period. Compared with the period before 1988, the years between 1988 and 1991 show an increase in survival of about 1 year at a CD4 count of 500 cells/μL. After 1991, the increase in survival is up to several years. For low CD4 counts, no calendar time effect is visible.

##### Statistical Significance

Although the main interest was in predictive value, we briefly summarize statistical significance. In all four full models, highly significant (*P* < 0.01) main effects were found for cd4, load, si, and ccr5 (except *P* = 0.12 in model I.D); years since seroconversion; and calendar period (*P* < 0.02 in models I.A and I.D). Interactions correspond to the above curves: cd4 had a significant interaction with si (*P* < 0.05), years since seroconversion (*P* < 0.01, but *P* = 0.07 in model I.D), and calendar period (*P* < 0.04). The interaction effect between load and years since seroconversion had *P* values of 0.38 (AIDS) and 0.02 (death). No significant effects were found for age and ccr2. No significant interaction effects were found between load and cd4 or between si and ccr5.

##### Predictive Value of Markers and Cofactors

Next, we looked at the predictive value of combined subsets of markers and cofactors. In a tree diagram (Fig. 2), the effect of the subsequent addition of markers and cofactors on predictive value is given for both AIDS and death as an end point.

Because age is always known to the clinician, all models included age. For example, if we know only age and cd4 (+CD4), *R* = 0.36 and *EV* = 0.16 for the prediction to develop AIDS within 3 years and *R* = 0.37 and *EV* = 0.16 for the prediction to die within 5 years. If cd4 is replaced by load (+LOAD), we have *R* = 0.38 and *EV* = 0.15 (AIDS) and *R* = 0.38 and *EV* = 0.17 (death). Compared with cd4 and load, si (+SI) has about as good a predictive value for *EV* (0.14 and 0.19) but a somewhat lower predictive value for *R* (0.20 and 0.27). When cd4 and load were included as binary covariables, with cutoff values determined by the total percentage of SI in the data set (i.e., 13%, resulting in cd4 = 250 and load = 5.1 as cutoff values), si had a higher predictive value than cd4 (+CD4 <250) and load (+LOAD >5.1). After inclusion of both cd4 and load, adding si increases predictive value (*R* = 0.50 and *EV* = 0.29 for AIDS and *R* = 0.51 and *EV* = 0.32 for death), but adding ccr5 does not. Nevertheless, ccr5 improves the predictive value if load is not included. After inclusion of cd4, load, and si, the only further increase in predictive value, and for EV only, is found after inclusion of the external covariables years since seroconversion and calendar period. We also compared the predictive value of cd4 and load used as continuous variables with the categoric model used by Mellors et al. (^{3}), but we added age. The former model has a lower predictive value than ours (i.e., model “+CD4+LOAD”):*R* = 0.41 for both AIDS and death and *EV* = 0.21 for AIDS and *EV* = 0.22 for death.

We compared changes in predictive value for cd4 and load over time after seroconversion. Therefore, we fitted and validated models of residual time to AIDS, including age and either cd4 or load for each baseline year separately. Load had a higher predictive value than cd4 in the first 2 years after seroconversion, but no difference was present after that point as can be seen in Table 2.

##### Summary of Clinically Relevant Findings

Because time since seroconversion is hardly ever known, and because our variable calendar period was only included to capture the effects of treatment and only refers to the past, these variables are of little use for prediction. In Figure 3, we show the simultaneous effect of the markers cd4, load, and si on the estimated probability of developing AIDS within 3 years and of dying within 5 years.

Each black line depicts combinations of cd4 and load that yield equal probability of developing AIDS within 3 years or of dying within 5 years. Hence, a person who has an SI phenotype with a CD4 count of 400 cells/μL and a log HIV-1 RNA value of 4.5 copies/mL has a probability of dying within 5 years of about 75% (the + sign in the upper right graph). If HAART were recommended to all persons who have more than a 30% chance of developing AIDS within 3 years (^{1}), almost everyone who has an SI phenotype would be advised to start HAART (everyone who has a cd4/load combination above the thick black line in the upper left graph). Note that only 5% (7/138) of the si values were found at cd4/load combinations below the thick black line. Also note that 5% of the SI values were found at CD4 counts above 750 cells/μL (highest: 980 cells/μL) and that 5% of the CD4 counts above 750 cells/μL were SI.

#### DISCUSSION

We constructed a comprehensive model to predict residual time to AIDS and death based on information on three well-known markers (HIV-1 RNA load, CD4 lymphocyte count, and HIV-1 phenotype), three biologic cofactors (age and two genetic cofactors), and two external cofactors (years since seroconversion and calendar period). We included interaction effects that are biologically plausible.

Our model extends previous research (^{3}) in terms of both variables used and methods. It resembles the model used by Shi et al. (^{29}). The intrinsic time-varying nature of markers was circumvented by the use of baseline values. By combining different baseline time points in one analysis, the available longitudinal marker information was better used. This approach can be applied to any analysis with marker values that change over time as long as some correction for correlated data is applied. The simplicity of the model made it easy to provide curves of residual time to AIDS and death. We used a definition of predictive value that reflects the purpose of prediction better than the *P* value. Statistically significant variables may have little or no predictive value (^{30}), as is also shown in our data with the CCR5-Δ32 alteration. Our model predicts based on one set of marker and cofactor values, without including previous marker information from the patient. If patients are screened repeatedly, its predictive power would most likely be enhanced by adding information on marker development, but this requires use of more complicated models that include information on marker trajectories (^{31–35}).

At first sight, the highly significant effect of years since seroconversion on AIDS and death risk in the full models contradicts earlier results (^{36–39}). These studies either answered different questions (^{37,38}) or did not consider a piece-wise linear effect of years since seroconversion, however (^{36,38}). If the variable of years since seroconversion was modeled only linearly, our *P* value increased enormously (AIDS: from <.0001 to .97; death: from .0008 to .62). Apparently, in the first years after seroconversion, the influence of CD4 count on time to AIDS is less than in later years as is also seen in Figure 1. Shi et al. (^{39}) showed a residual time to AIDS that was slightly longer shortly after seroconversion, but no *P* values were given.

Furthermore, we noted highly significant effects of calendar period on AIDS and death risk. Others have found a lengthening of the time to AIDS in the period from 1988 to 1992 (^{40}), perhaps because of the increasing administration of AZT and *Pneumocystis carinii* prophylaxis, which reportedly extends time to AIDS and death by about 1 year (^{41}). We found effects only at high CD4 counts, however, which contradicts earlier results (^{42,43}). Although treatment effects seem to be present, it is hard to correct for this at the individual level (^{40}).

In the analyses including load, bias because of incomplete and inconsistent data on load could have yielded biased results. Nevertheless, when we ran the model excluding load on all records for which load was not missing (957 records), no bias in RR was visible compared with the same model with all 1401 records.

The purpose of the study was to guide the clinician in deciding when to start HAART and to find out which markers and cofactors have highest predictive value. Besides CD4 and HIV-1 RNA load, HIV-1 phenotype provides important extra information, especially at high CD4 counts. The genetic cofactors considered have little to no additional predictive value. The external covariables add some predictive value. Figure 3 summarizes the important findings. If HAART were recommended to all persons who have more than a 30% chance of developing AIDS within 3 years, everyone who has a cd4/load combination above the thick black line in the left graphs would be advised to start HAART.

#### Acknowledgments:

This study was performed as part of the Amsterdam Cohort Studies on HIV infection and AIDS, a collaboration between the Municipal Health Service, the Academic Medical Centre, and the Central Laboratory of the Netherlands Red Cross Blood Transfusion Service Sanquin Division, Amsterdam, The Netherlands. The authors thank Maria Prins and Lucy Phillips for critically reading the manuscript.