Numerous studies have demonstrated that race/ethnicity can be an important factor in determining health and risk for specific diseases.1–4 However, although most studies have used self-reported race/ethnicity (SRR), it is known that this approach to determining ancestry is often inaccurate, does not reveal the extent of genetic admixture, and can mask the importance of race/ethnicity as a risk for disease or complications associated with specific drugs or drug combinations.5–8 To measure the extent of admixture within a given individual, panels of ancestry informative markers (AIMs) are being used to characterize ancestry. Genetically determined ancestry (GDA) estimates are measurements based on genomic variants, which are predictive of the regions (or ancestry) from which individuals inherited their genetic alleles. Because of population stratification, the allele frequency for some single nucleotide polymorphisms (SNPs) can be extremely high within some regional populations, but low among other regional populations.9 These differences can be extensive, permitting the identification of panels of SNPs to identify continental ancestry.9 Additionally, these same regional patterns suggest that AIMs may also be correlated with other SNPs which may be predictive or causal for a specific clinical phenotype.10 Therefore, AIMs have been found to be useful for disease-risk assessment and for control of confounding that may be due to population stratification.
As a result of the human diaspora that has occurred in the last 400–600 years, some regions in the world have populations with high admixtures of continental ancestry.9 The result is that self-reported race and ethnicity groups can include a mixture of genetic backgrounds. Self-reported race and ethnicity can also reflect the historical and/or social context of the group with which the person identifies.3 This implies that self-reported race measures a mixture of genetics and social factors, leaving self-reported race as a poor measurement for the genetic composition of an individual.
Although human immunodeficiency virus type-1 (HIV) can infect all populations, in the United States and other countries, specific groups are at increased risk of infection. However, even among these risk groups, despite recurrent high risk exposures, some people do not become infected, whereas others who become infected may have substantially different rates of progression of HIV-associated diseases. Host genetic factors have been identified as important determinants for both risk of HIV infection and rate of disease progression.11–14 Although racial and ethnic minorities, particularly those of African American and Hispanic ancestry, are disproportionately infected with HIV, few studies have examined the impact of ancestry on HIV disease, and none to our knowledge in children. In a study of 91 HIV-infected adults, it was shown that the association with the CYP2B6 metabolizer phenotype and virologic response to an NNRTI-based (efavirenz or nevirapine) antiretroviral regimen was confounded by GDA and that self-reported race/ethnicity was insufficient.15 A more nuanced conclusion was reached in a paper from the Multicenter AIDS Cohort that described dyslipidemia in 1779 HIV-1 infected men.16 They found a significant interaction between GDA and HIV/HAART status for all lipids tested, a low concordance between self-reported race and GDA in admixed populations, and better performance of GDA relative to self-reported race in statistical models. However, they still concluded that self-reported race remained a good clinical surrogate for GDA. In an additional study of 310 North American HIV-infected participants, the hazard ratios for the time to virologic suppression when comparing the CCR5-2459 GA and AA alleles to the AA alleles were stronger among participants with higher African ancestry, but no associations were found with GDA and the time to virologic suppression.17
In the research reported here, we have used a novel 41-SNP panel of AIMs18 to identify continental origin and admixture among HIV-infected children in the United States who predate the availability of effective combination antiretroviral therapy (cART). We have investigated the relationship between genetically determined ancestry (GDA) and self-reported race (SRR). Additionally, we have examined the associations of continental ancestry on CD4+ and HIV plasma RNA in children before the initiation of cART.
The analysis participants were children who participated in the Pediatric AIDS Clinical Trial Group (PACTG) protocols P152 (n = 431)19 and P300 (n = 563).20 These trials were US based, prospective, randomized, double blind, placebo controlled, multicenter protocols that assessed the efficacy of combination nucleoside reverse transcriptase inhibitor (NRTI) treatment regimens before the availability of effective cART. To be included in one of these trials, a child needed to have symptomatic HIV infection, be between 3 months and 18 years of age for P152,21 and between 42 days and 15 years of age for P300. For both protocols, the children had to meet criteria for a diagnosis of HIV infection from Centers for Disease Control and Prevention (CDC) classification system available at the time the protocols accrued. The vast majority of these children were HIV-infected through mother-to-child transmission as this cohort predates the routine administration of antiretrovirals to pregnant women.
Genetic Ancestry Determination
A recently described highly informative panel of 41 AIMs was used to determine continental ancestry.18 Each of the 41 SNPs was detected using real-time PCR on DNA specimens obtained from peripheral blood mononuclear cells of 994 HIV-infected children in P152 and P300.
The continental origin of analysis participants was estimated by comparing each child's genotypes to allele frequencies found in a large set of 3517 reference individuals originating from 107 populations around the world.18 Reference populations were grouped into the 7 world regions: Europe, Africa, America, Central/South Asia, South/West Asia, East Asia, and Oceania. Population structure and ancestry estimates were obtained in a trained clustering analysis using STRUCTURE v220.127.116.11.22,23 Five independent runs were performed at K = 7, using 20,000 burn-in cycles and 20,000 MCMC replications under the admixture model, including prior population information of the reference set. Allele frequencies were updated using only individuals with population information at a migration prior of 0.05. Uniform priors were used for the degree of admixture (“infer α” option) and for the allele frequency (λ = 1 option). All other parameters were set at default. Continental ancestry calling was performed by assigning the predominant continental origin to each subject.
Three-way admixture of the analysis children determined to be from Africa, Europe, or America was further estimated using STRUCTURE with prior population information of reference populations from Africa (N = 761), Europe (N = 1011), and America (N = 407) under an admixture model with correlated allele frequencies at K = 3 groups and reported as percent GDA.
CD4+ lymphocyte count, CD4+ lymphocyte percentage, and HIV plasma RNA were used as the outcomes for this study. These were measured at entry before initiation of therapy. P152 used the NASBA HIV-1 RNA QT Amplification System21 and P300 used the Roche Amplicor quantitative RNA PCR assay20 to measure HIV plasma RNA.
Baseline characteristics are presented as percentages, means, and standard deviations as appropriate. Linear regression with a robust variance estimator24 was used to measure the associations between GDA and CD4+ lymphocyte counts, CD4+ lymphocyte percent, and log10 HIV RNA. Adjustment variables were all selected a-priori and included age, weight for age z-score, study (P152/P300), and where appropriate self-reported race/ethnicity. Genetically determined ancestry was used as a proportion in the regression analyses so that the regression slopes are interpreted for a 100% change in GDA. This parameterization was used so that a direct comparison to self-reported race could be made. Because GDA totaled to 100%, the region (eg, African GDA) not in the model is interpreted as a reference group. In total, we considered 6 separate regression models to estimate the adjusted associations. The first model (model 1) was used to estimate the effects for GDA without adjustment for SRR. The second model (model 2) was used to estimate the effects of SRR without adjustment for GDA. The third model (model 3) included both GDA and SRR so that GDA and SRR are adjusted for each other. The remaining models (model 4 through model 6) included GDA after subsetting based on SRR. All confidence intervals (CI) are 95% CI and P-values <0.05 were considered to be statistically significant. R version 2.15.1 and SAS version 9.2 were used for the analyses.
Of the 994 participants for whom the complete panel of the 41 AIMs was determined, 61% self-reported as black, 25% self-reported as Hispanic, 13% self-reported as white, and 1% reported as other races or without a self-reported race (Table 1). Fifty-five percent were female; the average age was 3.8 years; and the average weight for age z-score was −0.66. The mean CD4+ lymphocyte count was 981; the mean CD4+ percent was 24%; and the mean plasma log10 HIV RNA was 5.11. Of the 994 subjects with AIMs measurements, 826 (168 with missing data) had HIV RNA data, and 987 (7 with missing data) had CD4+ counts and percentages. Missing HIV RNA data were because of the availability of specimens.
HIV-1-Infected Children in the United States Demonstrate a High Degree of Admixture
We first examined the association of SRR with GDA for each subject. Because the continental ancestry of the vast number of subjects clustered in 3 regions, Africa, Europe and the Americas, these continents were used to describe the GDA in these analyses. As seen in Figure 1, histograms for GDA for the 3 regions by self-reported race illustrate the relative skewness (departure from symmetry), kurtosis (degree of “peakedness”), and extensive variability among self-reported racial groups. All of the histograms display strong skewness, with the possible exception of the European GDA for those that self-report as Hispanic. Histograms with the most kurtosis include the American GDA for those that self-report as white or black, and for the Africa GDA for those that self-report as white. Figure 2 displays the mean GDA by self-reported race and the overall continental ancestry distribution for the entire cohort. For those who self-reported as black, the mean GDA was 74% for African ancestry, 17% for European ancestry, and 9% for Native American ancestry. For those who self-reported as white, the mean GDA was 14% for African ancestry, 76% for European ancestry and 10% for Native American ancestry. For self-reported Hispanics, GDA was 25% for African ancestry, 53% for European ancestry, and 22% for Native American ancestry.
Knowledge of Genetic Ancestry and Admixture Proportions Adds Additional Information Beyond Self-Reported Race for Pretreatment HIV RNA
Because viral load is an important indicator of HIV replication and a predictor of disease progression, we examined the role of genetic ancestry in determining the quantity of virus detected in the plasma of subjects before their initiation of antiretroviral therapy. In our initial analyses, we examined by regression analysis for all self-reported racial/ethnic groups, the pretreatment HIV RNA comparing subjects with European or American ancestry to those with African ancestry (Table 2). In this analysis (model 1), a higher percentage of European ancestry was associated with higher log10 RNA relative to children with more African ancestry [mean change in log10 RNA for a 100% change in GDA (slope) = 0.18, CI: 0.01 to 0.36, P-value = 0.041]. In the same analysis, children with a higher percentage of Native American ancestry had a nonstatistically significant lower log10 viral load compared to those with more African ancestry (slope = −0.29, CI: −0.79 to 0.20, P-value = 0.25). Similarly, children with a higher percentage of Native American ancestry had marginally significant lower log10 viral load when compared to those with more European ancestry (slope = −0.47, CI: −1.00 to 0.06, P-value = 0.080). The higher log10 RNA for those with more European ancestry held up after controlling for self-reported race (slope = 0.33, CI: 0.03 to 0.62, P-value = 0.028) (model 3), and the directionality of the estimated slopes was similar after subsetting on self-reported race (models 4 through 6). When the cohort was divided into subsets based on self-reported race, there was only 1 statistically significant result (self-reported blacks: slope = 0.62, CI: 0.18 to 1.05, P-value = 0.006). The interaction test for GDA and SRR was statistically significant (model 3 plus an interaction term, P-value = 0.039), implying that the mean change of log10 HIV RNA as a function of GDA differed by SRR.
Model 2 and model 3 (Table 2) contains comparisons of self-reported race for the mean log10 HIV RNA without and with adjustment for GDA, respectively. When comparing those who self-report as white to those who self-report as black, there was a marginally significant result (slope = 0.13, CI: −0.01 to 0.28, P-value = 0.065) (model 2). This estimate is similar to the Europe estimate from model 1. After controlling for GDA, no significant association was identified (model 3, SRR based on self-report as black as the reference), indicating that continental ancestry was a stronger predictor of viral set point than self-reported race/ethnicity.
Association of Continental Ancestry With CD4+ Lymphocyte Count and Percentage
We performed similar regression analyses to those described above for the association of continental ancestry with CD4+ count and percentage (Table 2). When controlling for GDA (model 3), in the analyses of self-reported race, those who self-identified as white had a higher CD4+ count compared with those who self-identified as black (slope = 243, CI: 28 to 448, P-value = 0.025). Similarly, when CD4+ percentage was used in the analyses, subjects that self-reported as white had on average higher CD4+ percentage than those who self-identified as black (slope = 3.5, CI: 0.2 to 6.7, P-value = 0.039). In adjusted linear regression models, subjects with 100% European ancestry were estimated to have 253 CD4+ cells per cubic millimeter lower (95% CI: −517 to 11, P-value = 0.06) when compared to subjects with 100% African ancestry. When comparing the SRR estimates from model 2 against model 3, the estimates change because of the correlation of GDA and SRR.
When we adjusted for host genetic factors that were previously found to be related to HIV RNA, CD4+ count and CD4+ percentage25 using the same participants, the estimated regression slopes for the GDA were similar (see Table S1, Supplemental Digital Content, http://links.lww.com/QAI/A764).
To our knowledge, this is the first report that has examined the importance of continental ancestry in a cohort of HIV-infected children from the United States. Our findings indicate that AIMs provide information above and beyond self-reported race and ethnicity, and demonstrate that there is considerable ancestry variability within self-reported race for these US-based studies of HIV-infected children. Associations were identified between GDA and HIV disease severity markers, such as HIV RNA, CD4+ counts and CD4 percent, with effects remaining after adjusting for self-reported race; these associations were strongest among those who self-reported as black. Additionally, the estimated associations between self-reported race and HIV RNA and CD4+ were stronger when adjusting for GDA. This implies that GDA may be a confounder for the socioeconomic effect of being a member of different racial groups, and that without adjustment for continental ancestry, this socioeconomic association could not be fully estimated. This argues for inclusion of AIMs in adjusted analyses when there is either a suspected strong genetic effect or a suspected strong socioeconomic effect on the outcome of interest.
Additionally, AIMs can be used to minimize bias associated with population stratification in case–control association studies of genetic markers.26,27 Clinically, AIMs may be useful in disease classification and identification of genetic risk. For example, a study of European Americans would have misidentified genetic variants in LCT and IRF4 genes with rheumatoid arthritis without accounting for continental ancestry.28 Moreover, in certain situations, differences may exist even within continental ancestry populations. For example, Menotti et al29 observed that when examining the risk for coronary heart disease, applying a model of northern Europeans to southern Europeans overestimated the absolute risk and vice versa. Because we also found similar results when controlling for some important genetic predictors, it is likely that additional genetic markers are related to CD4+ and HIV RNA, and are correlated with AIMs. This premise is supported by our findings that the associations with CD4+ and HIV RNA remain after adjustment for the genetic markers that we previously found associated with CD4+ and HIV RNA.
The previously reported confounding effect of GDA with the CYP2B6 metabolizer phenotype and virologic response to an NNRTI-based (efavirenz or nevirapine) antiretroviral regimen15 would not have played a role in our findings because HIV RNA and CD4+ were measured before study participants received any antiretrovirals. However, the association that we found with HIV RNA supports the plausibility that GDA is a confounder when investigating a virologic response to an NNRTI-based antiretroviral regimen. In addition, the reported interaction with GDA and lipid levels16 poses an interesting question for pediatric research, particularly given that an increase in lipid levels has been reported in children.30,31 This remains an area of open research. Lastly, the reported interaction with the CCR5-2459 genotype and GDA on the time to viral suppression17 might be congruent with our findings because it would be expected that there is a relationship with the interacting variables and the outcome under study32; nevertheless, viral suppression and GDA were not found to be statistically significant in the Cheruvu 2014 study. However, there are some important differences between the P152/P300 cohort examined in this study and these other HIV reports. P152/P300 consisted of children, whereas the other studies included adults; therefore, extrapolation may not be valid. In addition, we did not study the time to virologic suppression, rather we studied pre-ART plasma HIV RNA, CD4+ counts, and CD4+ percentages.
There are a few limitations to our study. The P152/P300 protocols did not collect information on socioeconomic status and thus we were not able to more finely describe effects within varying socioeconomic levels. Also, because the P152/P300 studies were conducted in the 1990s, it is possible that infants were born to women more likely to have difficulty with substance abuse than observed in more contemporary HIV-infected pregnant women.33 Finally, children in this study had a median age of 3.77 years and did not have access to cART. It would be unusual in the United States for a child to reach this median age without having initiated antiretroviral therapy. Thus, the observed results might not generalize to children who have early access to cART.
In summary, we have found through the identification of continental ancestry that the population of children with HIV infection within the United States has considerable ancestral heterogeneity, and that self-reported race/ethnicity is often not truly reflective of a child's genetic background. Moreover, identification of continental ancestry provides additional information with regard to HIV RNA and CD4+ cell count and percent beyond what is observed with self-reported race/ethnicity. Therefore, it is possible that many studies in the HIV literature that have included ancestry in the analysis based on self-reported race could have resulted in misleading conclusions. The utilization of AIMs to identify continental ancestry should be considered when outcomes associated with HIV infection are likely to have a genetic component.
1. Witzig R. The medicalization of race: scientific legitimization of a flawed social construct. Ann Intern Med. 1996;125:675–679.
2. Burchard EG, Ziv E, Coyle N, et al.. The importance of race and ethnic background in biomedical research and clinical practice. N Engl J Med. 2003;348:1170–1175.
3. Kaufman JS, Cooper RS. Commentary: considerations for use of racial/ethnic classification in etiologic research. Am J Epidemiol. 2001;154:291–298.
4. Tang H, Quertermous T, Rodriguez B, et al.. Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. Am J Hum Genet. 2005;76:268–275.
5. Johnson J. Predictability of the effects of race or ethnicity on pharmacokinetics of drugs. Int J Clin Pharmacol Ther. 2000;38:53–60.
6. Kappelhoff BS, Huitema AD, Yalvaç Z, et al.. Population pharmacokinetics of efavirenz in an unselected cohort of HIV-1-infected individuals. Clin Pharmacokinet. 2005;44:849–861.
7. Bidulescu A, Choudhry S, Musani SK, et al.. Associations of adiponectin with individual European ancestry in African Americans: the Jackson Heart Study. Front Genet. 2014;5:22.
8. Daya M, van der Merwe L, van Helden PD, et al.. The role of ancestry in TB susceptibility of an admixed South African population. Tuberculosis. 2014;94:413–420.
9. Winkler CA, Nelson GW, Smith MW. Admixture mapping comes of age. Annu Rev Genomics Hum Genet. 2010;11:65–89.
10. Winkler C, An P, O'Brien SJ. Patterns of ethnic diversity among the genes that influence AIDS. Hum Mol Genet. 2004;13:R9–R19.
11. Seldin MF, Price AL. Application of ancestry informative markers to association studies in European Americans. PLoS Genet. 2008;4:e5.
12. Theodorou I, Capoulade C, Combadiere C, et al.. Genetic control of HIV disease. Trends Microbiol. 2003;11:392–397.
13. An P, Winkler CA. Host genes associated with HIV/AIDS: advances in gene discovery. Trends Genet. 2010;26:119–131.
14. Singh KK, Spector SA. Host genetic determinants of human immunodeficiency virus infection and disease progression in children. Pediatr Res. 2009;65:55R–63R.
15. Frasco MA, Mack WJ, Van Den Berg D, et al.. Underlying genetic structure impacts the association between CYP2B6 polymorphisms and response to efavirenz and nevirapine. AIDS. 2012;26:2097–2106.
16. Nicholaou MJ, Martinson JJ, Abraham AG, et al.. HAART-associated dyslipidemia varies by biogeographical ancestry in the multicenter AIDS cohort study. AIDS Res Hum Retroviruses. 2013;29:871–879.
17. Cheruvu VK, Igo RP Jr, Jurevic RJ, et al.. African ancestry influences CCR5 -2459G>A genotype-associated virologic success of highly active antiretroviral therapy. J Acquir Immune Defic Syndr. 2014;66:102–107.
18. Nievergelt CM, Maihofer AX, Shekhtman T, et al.. Inference of human continental origin and admixture proportions using a highly discriminative ancestry informative 41-SNP panel. Invest Genet. 2013;4:13.
19. Englund JA, Baker CJ, Raskino C, et al.. Zidovudine, didanosine, or both as the initial treatment for symptomatic HIV-infected children. AIDS Clinical Trials Group (ACTG) Study 152 Team. N Engl J Med. 1997;336:1704–1712.
20. McKinney RE Jr, Johnson GM, Stanley K, et al.. A randomized study of combined zidovudine-lamivudine versus didanosine monotherapy in children with symptomatic therapy-naive HIV-1 infection. The Pediatric AIDS Clinical Trials Group Protocol 300 Study Team. J Pediatr. 1998;133:500–508.
21. Palumbo PE, Raskino C, Fiscus S, et al.. Virologic and immunologic response to nucleoside reverse-transcriptase inhibitor therapy among human immunodeficiency virus-infected infants and children. J Infect Dis. 1999;179:576–583.
22. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959.
23. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003;164:1567–1587.
24. White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;48:817–838.
25. Qin M, Brummel S, Singh KK, et al.. Associations of host genetic variants on CD4+ lymphocyte count and plasma HIV-1 RNA in antiretroviral naive children. Pediatr Infect Dis J. 2014;33:946–952.
26. Lin DY, Zeng D. Correcting for population stratification in genomewide association studies. J Am Stat Assoc. 2011;106:997–1008.
27. Nassir R, Kosoy R, Tian C, et al.. An ancestry informative marker set for determining continental origin: validation and extension using human genome diversity panels. BMC Genet. 2009;10:39.
28. Tian C, Plenge RM, Ransom M, et al.. Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet. 2008;4:e4.
29. Menotti A, Lanti M, Puddu P, et al.. Coronary heart disease incidence in northern and southern European populations: a reanalysis of the seven countries study for a European coronary risk chart. Heart. 2000;84:238–244.
30. Miller TI, Borkowsky W, DiMeglio LA, et al.. Metabolic abnormalities and viral replication are associated with biomarkers of vascular dysfunction in HIV-infected children. HIV Med. 2012;13:264–275.
31. Aldrovandi GM, Lindsey JC, Jacobson DL, et al.. Morphologic and metabolic abnormalities in vertically HIV-infected children and youth. AIDS. 2009;23:661–672.
32. VanderWeele TJ, Robins JM. Four types of effect modification: a classification based on directed acyclic graphs. Epidemiology. 2007;18:561–568.
33. Rough K, Tassiopoulos K, Kacanek D, et al.. Dramatic decline in substance use by HIV-infected pregnant women in the United States from 1990 to 2012. AIDS. 2015;29:117–123.