Objective: To identify a pre-HAART gene expression signature in peripheral blood mononuclear cells (PBMCs) predictive of CD4+ T-cell recovery during HAART in HIV-infected individuals.
Design: This retrospective study evaluated PBMC gene expression in 24 recently HIV-infected individuals before the initiation of HAART to identify genes whose expression is predictive of CD4+ T-cell recovery after 48 weeks of HAART.
Methods: The change in CD4+ T-cell count (ΔCD4) over the 48-week study period was calculated for each of the 24 participants. Twelve participants were assigned to the ‘good’ (ΔCD4 ≥ 200 cells/μl) and 12 to the ‘poor’ (ΔCD4 < 200 cells/μl) CD4+ T-cell recovery group. Gene expression profiling of the entire transcriptome using Illumina BeadChips was performed with PBMC samples obtained before HAART. Gene expression classifiers capable of predicting CD4+ T-cell recovery group (good vs. poor), as well as the specific ΔCD4 value, at week 48 were constructed using methods of Class Prediction.
Results: The expression of 40 genes in PBMC samples taken before HAART predicted CD4+ T-cell recovery group (good vs. poor) at week 48 with 100% accuracy. The expression of 22 genes predicted a specific ΔCD4 value for each HIV-infected individual that correlated well with actual values (R = 0.82). Predicted ΔCD4 values were also used to assign individuals to good vs. poor CD4+ T-cell recovery groups with 79% accuracy.
Conclusion: Gene expression in PBMCs can be used as biomarkers to successfully predict disease outcomes among HIV-infected individuals treated with HAART.
aDepartment of Medicine, University of California San Diego, La Jolla, USA
bVeterans Affairs San Diego Healthcare System, San Diego, USA
cDepartment of Pathology, University of California San Diego, La Jolla, California, USA
dBiometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, Rockville, Maryland, USA
eIllumina Inc., San Diego, California, USA.
Received 21 August, 2009
Revised 12 October, 2009
Accepted 4 November, 2009
Correspondence to Christopher H. Woelk, Assistant Professor, Department of Medicine, University of California San Diego, Stein Clinical Research Building, Room 326, 9500 Gilman Drive 0679, La Jolla, CA 92093-0679, USA. Tel: +1 858 552 8585x7193; fax: +1 858 552 7445; e-mail: email@example.com
HIV-infected individuals who successfully suppress HIV replication while receiving HAART but minimally increase CD4+ T-cell counts in the peripheral blood are characterized as having poor immune recovery. Multiple definitions of poor immune recovery exist, including the failure to increase CD4+ T-cell counts within the first year of HAART above a certain threshold (i.e., 200 cells/μl) or by greater than 50, 100, or 200 cells/μl compared with counts taken before HAART initiation (ΔCD4) [1–5]. Poor CD4+ T-cell recovery is a common and significant health problem for HIV-infected patients, affecting close to a third of some cohorts [1,2]. Piketty et al.  demonstrated that the relative risk of clinical progression (AIDS-defining event or death) was 13.3-fold higher for patients with ΔCD4 less than 100 cells/μl compared with patients with increases above this threshold. Multiple factors have been associated with poor CD4+ T-cell recovery during HAART and include increasing age, hepatitis C virus co-infection, HAART regimen, persistent low level virus replication, decreased CD4+ T-cell proliferation, and host genetic variation .
A variety of disease states including viral infections [6–9], malignancies [10–12], and mental disorders [13,14] can modulate gene expression in peripheral blood mononuclear cells (PBMCs), as these cells circulate systemically. Therefore, PBMC gene expression profiling has been used to construct classifiers capable of diagnosing disease states, predicting disease outcomes and determining patient response to drug therapy [15,16]. Advantages of using PBMCs include their accessibility, ease of isolation from whole blood, and large yields per patient for subsequent RNA extraction.
Vahey et al.  previously analyzed gene expression in PBMC samples taken from 48 HIV-infected patients electing to discontinue HAART in the AIDS Clinical Trials Group (ACTG) Study A5170. Good outcome (N = 24) was defined as a decline of less than 20% in the CD4+ T-cell count over the 24 weeks following HAART discontinuation and poor outcome (N = 24) as a decline of more than 20%. Prediction analysis of microarrays in R  was used to identify 53 genes whose expression at HAART discontinuation could predict with 81% accuracy those patients who would later progress to the good vs. poor outcome at week 24. Interrupting HAART is not a viable therapeutic approach for HIV-infected patients, as it results in significant virus rebound [18,19] and increased morbidity and mortality . Of greater clinical relevance would be the ability to predict, before HAART initiation, those HIV-infected patients who will later exhibit poor CD4+ T-cell recovery. To this end, we have analyzed gene expression in PBMC samples from 24 HIV-infected patients in the Acute Infection and Early Disease Research Program (AIEDRP) and constructed gene expression classifiers capable of predicting the extent of CD4+ T-cell recovery.
HIV-infected male participants in the San Diego AIEDRP cohort were retrospectively selected for this study. The study period was delineated by CD4+ T-cell counts taken before the start of HAART and after 48 weeks of HAART, which were then used to calculate ΔCD4. From a total of 328 patients, 98 were selected who were HAART-naive before enrollment, continuously adhered to HAART, developed (<50 HIV RNA copies/ml) and maintained (no subsequent concurrent measurements of >200 HIV RNA copies/ml) complete viral suppression during the study period, and had viably stored PBMC samples for microarray analysis taken within 2 weeks of the start of HAART. To focus on HIV-infected participants with lower CD4+ T-cell counts before HAART, 24 participants with baseline counts less than 500 CD4+ T cells/μl were selected for gene expression analysis. All but one of the study participants were treated continuously over the study period with a protease inhibitor (PI)-based or non-nucleoside reverse transcriptase inhibitor (NNRTI)-based HAART regimen. A single patient started therapy with a triple nucleoside reverse transcriptase inhibitor regimen but was switched to an NNRTI-based regimen during the study period (abacavir/lamivudine/zidovudine). Twelve patients belonged to the good (ΔCD4 ≥ 200 cells/μl) and 12 to the poor (ΔCD4 < 200 cells/μl) CD4+ T-cell recovery group using the ΔCD4 threshold proposed by Haas et al. .
Microarray gene expression analysis
Viable PBMCs from the 24 participants were obtained by rapidly thawing cryopreserved samples at 37°C. RNA was isolated from PBMC samples using RNeasy Mini Kits (QIAGEN, Germantown, Maryland, USA) and its quality was assessed by calculating an RNA integrity number (RIN) using the 2100 Bioanalyzer (Agilent Technologies, Santa Clara, California, USA). RNA from all 24 samples was deemed of sufficient quality (mean RIN 8.15 ± 0.93) for microarray gene expression analysis and was used to generate cDNA. Biotinylated labeled cRNA was generated from cDNA for hybridization to HumanWG-6 v3 Expression BeadChips (Illumina, San Diego, California, USA) and the expression analysis of 48 803 transcripts. Raw gene expression data were log2 transformed and robust spline normalized using the Bioconductor package lumi  in R (version 2.8.0). The quality of microarray data was confirmed using MA-plots constructed using the affyPLM package . Genes whose expression was not detected in any of the samples were removed from further analysis. Class discovery analysis clustered samples based on their gene expression in an unsupervised manner and identified batch effects that were removed using ComBat . Gene expression data are available at the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE19087.
Class prediction of CD4+ T-cell recovery group
Supervised methods of Class Prediction in BRB-ArrayTools  were used to construct gene expression classifiers that could assign HIV-infected patients to a discrete CD4+ T-cell recovery group (good vs. poor). Classifiers composed of different numbers of genes were constructed by Recursive Feature Elimination (RFE) and their ability to predict CD4+ T-cell recovery was assessed by several multivariate classification methods [i.e., diagonal linear discriminant analysis, support vector machines (SVMs), and compound covariate, nearest neighbor and centroid predictors) in a leave one out cross-validation (LOOCV) approach.
Prediction of ΔCD4 values for each HIV-infected participant
The least angle regression (LAR) plugin in BRB-ArrayTools was used to implement the LASSO algorithm, which develops a linear model for predicting a continuous response variable (ΔCD4) from gene expression data within a cross-validated framework. LASSO avoids the overfitting characteristic of least squares linear regression when the number of genes is large compared with the number of samples . For more information regarding the above statistical methods, please refer to the BRB-ArrayTools manual .
The HIV-infected participants in the good (ΔCD4 ≥ 200 cells/μl) and poor (ΔCD4 < 200 cells/μl) CD4+ T-cell recovery groups were compared for differences in demographic, virological, and immunological data (Table 1). Participants in the good recovery group had significantly lower CD4+ T-cell counts and higher viral loads before the start of HAART. None of the other parameters were significantly different between recovery groups.
Microarray gene expression data for the entire transcriptome were generated for each participant using PBMC samples taken before HAART. The ability of the expression of different numbers of genes to predict whether a participant would progress to the good vs. poor CD4+ T-cell recovery group at week 48 was assessed using different multivariate classification methods in a LOOCV approach. Classification accuracy first reached 100% when the expression of 40 genes was used with the SVM multivariate classification method (Fig. 1a). Descriptions of these 40 genes are presented in Supplementary Table 1. No other multivariate classification method attained an accuracy of 100% for assigning patients to CD4+ T-cell recovery groups.
In addition to predicting CD4+ T-cell recovery group, gene expression prior to HAART was used to predict the specific ΔCD4 value for each participant at week 48. Twenty-two genes were identified in the final model using the LASSO algorithm whose expression predicted ΔCD4 values in a LOOCV approach that correlated with actual values with an R = 0.82 as calculated by Pearson's correlation analysis (Fig. 1b). Descriptions of these 22 genes are presented in Supplementary Table 2.
Normally gene expression in PBMCs is used to predict whether patients will progress to one of two discrete classes . In this respect, we have used Class Prediction methods to identify 40 genes whose expression can predict with 100% accuracy whether an HIV-infected individual will progress to the ‘good’ or ‘poor’ CD4+ T-cell recovery group after 48 weeks of HAART (Fig. 1a). However, when CD4+ T-cell recovery groups were defined using a ΔCD4 threshold of 200 cells/μl, it is unclear whether this classifier is truly a prognostic classifier for recovery group or a diagnostic classifier for differences in clinical parameters at baseline (Table 1). Class Prediction analysis was repeated using a different ΔCD4 threshold of 100 cells/μl  to define good (N = 17) and poor (N = 7) recovery groups, as there were no differences in baseline statistics between groups at this threshold. Despite unbalanced numbers between recovery groups, 10 genes were identified that could predict recovery group with 92% accuracy (data not shown), indicating that gene expression before HAART initiation does indeed have value for predicting recovery group.
Using gene expression to predict the specific ΔCD4 value for each participant overcomes limitations associated with imposing a dichotomous threshold on a continuous variable in order to define outcome groups , and thus any significant differences in clinical data at baseline between these groups. The LASSO algorithm was used to select a final model of 22 genes whose expression levels were able to predict ΔCD4 values with good correlation (R = 0.82) to actual ΔCD4 values in a LOOCV approach (Fig. 1b). Predicted ΔCD4 values were able to assign HIV-infected participants to good and poor recovery groups based on the threshold of 200 cells/μl with 79% accuracy. With ΔCD4 thresholds of 100 and 300 cells/μl, the classification accuracies were 75 and 88%, respectively (data not shown).
The mechanistic implications for CD4+ T-cell recovery of the genes identified in this study are unclear (Supplementary Tables 1 and 2). Genes were selected based on their expression in PBMC samples and their ability to predict CD4+ T-cell recovery. Gene expression should be analyzed in the CD4+ T-cell subset, as it is directly related to the disease phenotype, in order to identify genes that drive CD4+ T-cell recovery. Genes differentially expressed in this subset between recovery groups (good vs. poor) following 48 weeks of HAART should be mapped to biological pathways and gene ontologies to elucidate the mechanism of CD4+ T-cell recovery.
In the future, gene expression classifiers may be formulated into bench-top assays for use in the HIV clinic to identify patients at risk of poor CD4+ T-cell recovery. Prior to use in the HIV clinic, the accuracy of the gene expression classifiers constructed in this study must be internally validated by analyzing a greater number of HIV-infected individuals in the AIEDRP cohort. Additionally, classifiers must be externally validated using patients from an unrelated HIV-infected cohort to avoid participant selection or demographic biases in the San Diego AIEDRP cohort . In summary, the utility of gene expression data in HAART-naive HIV-infected individuals to predict the future course of disease has been clearly demonstrated.
Gene expression data were generated via funding through a Developmental Grant from the Center for AIDS Research (CFAR) at the University of California San Diego (UCSD). This work was performed with the support of the Genomics Core at the UCSD CFAR, the San Diego Veterans Medical Research Foundation, National Institutes of Health research grants (AI69432, AI043638, MH62512, MH083552, AI077304, AI36214, AI047745, AI007384 and AI74621), and a research grant from the California HIV/AIDS Research Program (RN07-SD-702). Microarray hybridization and scanning was performed at the UCSD Biomedical Genomics (BIOGEM) core facility with the help of Dr Gary Hardiman (Director) and James Sprague. Class Prediction analysis was performed using BRB-ArrayTools developed by Dr Richard Simon and the BRB-ArrayTools Development Team. We would like to thank the reviewers of this manuscript and Dr Sanjay Mehta whose comments resulted in a more coherent presentation of our results.
C.H.W. conceived the study, analyzed patient and microarray data, wrote the original manuscript, and addressed the reviewers' comments. N.B.B. analyzed microarray data and generated the figures presented in the manuscript. P.D. and S.E.R. extracted and assessed the quality of RNA from PBMC samples. Y.Z. implemented the LASSO algorithm for the prediction of ΔCD4 from gene expression data. M.G. and J.L. advised on experimental design. J.P. performed statistical analyses. M.G., D.D.R., D.S., and S.J.L. provided access to blinded patient clinical data and aided in the interpretation of gene expression data.
Although not a direct conflict of interest, C.H.W., M.G., D.D.R., D.S., and S.J.L. have received research support from Pfizer Inc. In addition, S.J.L. has served on the clinical advisory board for Monogram Biosciences Inc. and received research support from Merck Laboratories. D.D.R. is also a consultant for Theraclone, Myriad, Bristol-Myers Squibb, Anadys Pharmaceuticals Inc., Gilead Sciences, Hoffman-La Roche Inc., Merck and Co. Inc., Monogram Biosciences, Biota, Chimerx, Idenix and Gen-Probe. J.L. is employed by Illumina Inc., manufacturer of the microarray platform used in this study, but this does not represent a conflict of interest, as this platform was selected prior to his inclusion on the project and he did not perform any data analysis.
1. Gazzola L, Tincati C, Bellistri GM, Monforte A, Marchetti G. The absence of CD4+ T cell count recovery despite receipt of virologically suppressive highly active antiretroviral therapy: clinical risk, immunological gaps, and therapeutic options. Clin Infect Dis 2009; 48:328–337.
2. Schechter M, Tuboi SH. Discordant immunological and virological responses to antiretroviral therapy. J Antimicrob Chemother 2006; 58:506–510.
3. Goicoechea M, Smith DM, Liu L, May S, Tenorio AR, Ignacio CC, et al
. Determinants of CD4+ T cell recovery during suppressive antiretroviral therapy: association of immune activation, T cell maturation markers, and cellular HIV-1 DNA. J Infect Dis 2006; 194:29–37.
4. Piketty C, Weiss L, Thomas F, Mohamed AS, Belec L, Kazatchkine MD. Long-term clinical outcome of human immunodeficiency virus-infected patients with discordant immunologic and virologic responses to a protease inhibitor-containing regimen. J Infect Dis 2001; 183:1328–1335.
5. Haas DW, Geraghty DE, Andersen J, Mar J, Motsinger AA, D'Aquila RT, et al
. Immunogenetics of CD4 lymphocyte count recovery during antiretroviral therapy: an AIDS Clinical Trials Group study. J Infect Dis 2006; 194:1098–1107.
6. Motomura K, Toyoda N, Oishi K, Sato H, Nagai S, Hashimoto S, et al
. Identification of a host gene subset related to disease prognosis of HIV-1 infected individuals. Int Immunopharmacol 2004; 4:1829–1836.
7. Ockenhouse CF, Bernstein WB, Wang Z, Vahey MT. Functional genomic relationships in HIV-1 disease revealed by gene-expression profiling of primary human peripheral blood mononuclear cells. J Infect Dis 2005; 191:2064–2074.
8. Vahey MT, Wang Z, Su Z, Nau ME, Krambrink A, Skiest DJ, Margolis DM. CD4+ T-cell decline after the interruption of antiretroviral therapy in ACTG A5170 is predicted by differential expression of genes in the ras signaling pathway. AIDS Res Hum Retroviruses 2008; 24:1047–1066.
9. Tateno M, Honda M, Kawamura T, Honda H, Kaneko S. Expression profiling of peripheral-blood mononuclear cells from patients with chronic hepatitis C undergoing interferon therapy. J Infect Dis 2007; 195:255–267.
10. Twine NC, Stover JA, Marshall B, Dukart G, Hidalgo M, Stadler W, et al
. Disease-associated expression profiles in peripheral blood mononuclear cells from patients with advanced renal cell carcinoma. Cancer Res 2003; 63:6069–6075.
11. Burczynski ME, Twine NC, Dukart G, Marshall B, Hidalgo M, Stadler WM, et al
. Transcriptional profiles in peripheral blood mononuclear cells prognostic of clinical outcomes in patients with advanced renal cell carcinoma. Clin Cancer Res 2005; 11:1181–1189.
12. DePrimo SE, Wong LM, Khatry DB, Nicholas SL, Manning WC, Smolich BD, et al
. Expression profiling of blood samples from an SU5416 Phase III metastatic colorectal cancer clinical trial: a novel strategy for biomarker identification. BMC Cancer 2003; 3:3.
13. Glatt SJ, Everall IP, Kremen WS, Corbeil J, Sasik R, Khanlou N, et al
. Comparative gene expression analysis of blood and brain provides concurrent validation of SELENBP1 up-regulation in schizophrenia. Proc Natl Acad Sci U S A 2005; 102:15533–15538.
14. Segman RH, Shefi N, Goltser-Dubner T, Friedman N, Kaminski N, Shalev AY. Peripheral blood mononuclear cell gene expression profiles identify emergent posttraumatic stress disorder among trauma survivors. Mol Psychiatry 2005; 10:500–513, 425.
15. Woelk CH, Burczynski ME. The clinical relevance of gene expression profiles in peripheral blood mononuclear cells
. In: Columbus F, editor. Oligonucleotide array sequence analysis
. Hauppage, NY: NOVA Publishing; 2008. pp. 38–50.
16. Burczynski ME, Dorner AJ. Transcriptional profiling of peripheral blood cells in clinical pharmacogenomic studies. Pharmacogenomics 2006; 7:187–202.
17. Tibshirani R, Hastie T, Narasimhan B, Chu G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 2002; 99:6567–6572.
18. Kaufmann DE, Lichterfeld M, Altfeld M, Addo MM, Johnston MN, Lee PK, et al
. Limited durability of viral control following treated acute HIV infection. PLoS Med 2004; 1:e36.
19. Streeck H, Jessen H, Alter G, Teigen N, Waring MT, Jessen A, et al
. Immunological and virological impact of highly active antiretroviral therapy initiated during acute HIV-1 infection. J Infect Dis 2006; 194:734–739.
20. El-Sadr WM, Lundgren JD, Neaton JD, Gordin F, Abrams D, Arduino RC, et al
. CD4+ count-guided interruption of antiretroviral treatment. N Engl J Med 2006; 355:2283–2296.
21. Du P, Kibbe WA, Lin SM. Lumi: a pipeline for processing illumina microarray. Bioinformatics 2008; 24:1547–1548.
22. Bolstad BM, Collin F, Brettschneider J, Simpson K, Cope L, Irizarry R, Speed TP. Quality assessment of affymetrix GeneChip data. In: Gentleman R, Carey V, Huber W, Irizarry R, Dutoit S, editors. Bioinformatics and computational biology solutions using R and bioconductor. Heidelberg: Springer; 2005. pp. 33–47.
23. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007; 8:118–127.
24. Simon R, Lam A, Li M, Ngan M, Menenzes S, Zhao Y. Analysis of gene expression data using BRB-ArrayTools. Cancer Inform 2007; 2:11–17.
25. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat 2004; 32:407–499.
27. Simon R. Roadmap for developing and validating therapeutically relevant genomic classifiers. J Clin Oncol 2005; 23:7332–7341.