Background and objective
Skin color and self-reported ethnicity have systematically been used in the pharmacogenetic/-genomic literature as phenotypic proxies for geographical ancestry. Population admixture, however, challenges the appropriateness of this approach. We compared the effectiveness of color-based and marker-based biogeographical ancestry classifications in typing polymorphisms in GSTM1, GSTM3 and GSTT1 in the heterogeneous Brazilian population.
Individual DNA from 335 healthy Brazilians was typed for a set of insertion/deletion polymorphisms, previously validated as ancestry informative markers. GSTM1-null and GSTT1-null polymorphisms were detected by multiplex PCR and the GSTM3*B polymorphism by restriction-fragment length polymorphism. Nonlinear logistic regression modeling was developed to describe the association between the GST polymorphisms and ancestry estimated by the ancestry informative markers.
Analysis of the ancestry informative markers data with the Structure software revealed the existence of only two significant clusters, one of which was inferred to be an estimate of the African component of ancestry. Nonlinear logistic regression showed that the odds of having the GSTM1-null genotype decreases (P<0.0004, Wald statistics), whereas the odds of having the GSTM3*B allele increases (P<0.0001) with the increase of the African component of ancestry, throughout the range (0.13–0.95) observed in the population sample. The African component of ancestry proportion was not associated with GSTT1-null frequency. Within the self-reported Black and Intermediate groups, there were significant differences in ancestry informative markers between GSTM1-null and non-null individuals, and between carriers and noncarriers of the GSTM3*B allele.
Interethnic admixture is a source of cryptic population structure that may lead to spurious genotype–phenotype associations in pharmacogenetic/-genomic studies. Logistic regression modeling of GST polymorphisms shows that admixture must be dealt with as a continuous variable, rather than proportioned in arbitrary subcategories for the convenience of data quantification and analysis.