Population data on the HLA system are important to several areas of medicine and science. Increased susceptibility or resistance to more than 500 diseases has been shown to be associated with various HLA antigens, alleles, or haplotypes (1) and, more recently, there have been reports of the effect of HLA genotype on response to different vaccines (2-5). Information about the distribution of HLA genotypes in different populations may be important to the development of peptidebased vaccines because the level and type of immune response, the extent of risk of adverse reaction, and the proportion of the population that becomes protected by vaccination may vary among populations that have different distributions of HLA antigens (6). In transplantation, frequencies of HLA phenotypes and haplotypes can be used to determine the probability of finding a donor with a particular HLA phenotype for the individual patient and to predict the effect of HLA matching-based organ allocation schemes on different groups of patients(7). HLA population data are also used to study human migration and evolution (8, 9) and in forensic medicine (10, 11). Because of the extensive polymorphism of the HLA system, large population samples or family studies are needed to obtain meaningful estimates of allele and haplotype frequencies. HLA typing for organ and bone marrow transplantation has provided the opportunity to analyze HLA data from large populations tested as potential donors and/or patients. As of May, 1995 there were HLA data on nearly 1.5 million people listed in the National Marrow Donor Program (NMDP)* registry and these data have been analyzed by others (12). We have analyzed the HLA data from 20,069 end-stage renal disease patients and 18,319 cadaver organ donors from the transplant registry of The United Network for Organ Sharing (UNOS) and present here allele frequencies and a portion of the two and three point haplotype data. We have reported previously (13) on the HLA antigen and phenotype distributions of these groups.
MATERIALS AND METHODS
We have analyzed the HLA-A, -B, and -DR phenotypes of the donors listed in the UNOS registry during the period 1988-1992 and of patients on the UNOS renal waiting list in April 1991. After purging the data files of phenotypes containing nonexistent antigens or codes for unknown or “not tested” antigens and of phenotypes from individuals of unknown race, there were data on 20,069 patients and 18,513 donors. The data were divided into racial subsets according to the racial designations used by UNOS. The group of 194 Oriental donors was too few to obtain meaningful estimates of allele and haplotype frequencies, and therefore was not included in the analyses. A few antigens were observed too infrequently to permit reliable estimation of the frequencies of haplotypes containing them. Therefore, for the haplotype frequency calculations, we converted these antigens to the broad specificity with which they are serologically crossreactive as follows: A66→A10; A74→A32; B59→B8; B75, 76 and 77→B15; B67→B16; B71 and 72→B70; and B47, 48, and 73→B40.
Allele frequencies for each subgroup were determined by maximum likelihood estimation, with alleles defined by the serologic antigens-that is, most HLA antigens represent several alleles the products of which are not distinguishable by serologic typing methods. In determining frequencies, we considered two or more alleles that encoded the same antigen to be equivalent to a single allele. Populations were tested for fit to Hardy-Weinberg equilibrium (HWE), for each locus, using a chi-squared goodness-of-fit test. The phenotypes in the data files did not represent a uniform set of antigens-that is, there were phenotypes containing broad antigens (e.g., A9 and B12) and phenotypes containing splits or subtypes (e.g., A23, A24, B44, and B45). There were no phenotypes that contained both a subtype and its respective broad antigen because the subtype would not be distinguishable in the presence of the broad antigen. This was taken into account in the calculation of allele frequency. For the goodness-of-fit tests, we converted all subtypes to their respective broad antigens to accomodate the masking of subtypes by broad antigens.
Two and three locus haplotypes were calculated using the squareroot method (14). The algorithm for two locus haplotypes is well known. That for three locus haplotypes is described in Figure 1. When the frequencies of all haplotypes for a group did not sum to one, because of rounding errors, the frequencies were normalized by dividing each by the sum obtained. Linkage disequilibrium is the nonrandom association of alleles of linked loci. Its numerical value is the difference between the observed haplotype frequency and the frequency that would be expected for random association of the alleles of the haplotype. Linkage disequilibrium values were obtained by subtracting the product of the allele frequencies from the haplotype frequency: Δij = hij - piqj and Δijk = hijk - piqjrk where i, j, and k represent the alleles at linked loci; Δ is the linkage disequilibrium value (for the haplotype comprising alleles i and j, or i,j, and k); h is the haplotype frequency; and p, q, and r are the allele frequencies. The strengths of linkage disequilibria for different haplotypes cannot be compared using the linkage disequilibrium values because these values are affected by the frequencies of the alleles in the haplotype. To permit such comparisons, we obtained relative or normalized linkage disequilibrium values, Δrel, by dividing the linkage disequilibrium value of each haplotype by the maximum linkage disequilibrium value possible for the haplotype. The possible range of values for Δrel is 0 to + 1. Maximum linkage disequilibrium values for two and three locus haplotypes were calculated as follows:Equation
where p, q, and r are the allele frequencies and p > q for haplotypes of two loci and p > r < q for haplotypes of three loci. All analyses were performed on a personal computer using programs written by one of us (A.G.S.) in APL (A Program Language).
Calculation of genetic distance between populations was performed using the algorithm (15): Equation
where Djk is the genetic distance between populations j and k, and pji and pki are the frequencies of allele i in populations j and k.
The number of individuals in each group is given in Table 1. Allele frequencies are presented in Tables 2-4. We have reported elsewhere (13) on the antigen distributions for these populations and summarize here only briefly those features that apply to the allele distributions as well. These are: (1) no alleles were unique to a single racial group; (2) the frequency of blanks was always higher in patients than in donors of the same race, was usually highest among Orientals among the different racial groups, and was highest for DR among the three loci; (3) the frequencies of broad antigens tended to be higher in patients than in donors of the same race, while the opposite was true for the splits or variants; and (4) there were differences in the distribution of alleles between donors and patients of the same race as well as between donors (or patients) of different races- however, the differences between races were greater than those between groups of the same race. The allele frequency data indicate that the increased frequency of phenotypic blanks-i.e., phenotypes with only one antigen at one or more loci-among patients and at the DR locus is due to an increased frequency of “blank” alleles rather than an increase in homozygosity. There was a steady decline in the frequency of phenotypic blanks that occurred in the donors over the five year collection period which, in turn, indicates that most of the “blank alleles” encoded antigens that were not recognized in the more historical tests.
Goodness of fit tests revealed significant deviations from HWE in most cases (Table 5). These deviations occurred more frequently among patient groups (11 of 12 evaluations) than among donor groups (3 of 9 evaluations), and most frequently with DR phenotypes for both patients and donors.
We calculated genetic distances among several groups using HLA-A locus allele frequencies obtained here for donors and patients of different races and HLA-A frequencies from the 11th International Histocompatibility Testing Workshop (16) for North African Blacks, South African Blacks, Zaireans, West African Blacks, and African-Americans. This statistic is an assessment of the difference, between two populations, in allele distribution-which, in turn, is an assessment of the degree of genetic difference between the populations. When the allele frequencies are the same in both populations, Djk = 0. The data are presented in Table 6. The genetic distance obtained for the comparison of African-American allele frequencies from the 11th IWS with those from UNOS is 0.173 (Table 6). This value is greater than those obtained for comparisons between donors and patients of the same race, for each of the three races (≅ 0.06). It is comparable to that between African-American and Hispanic donors (0.177) and that between Caucasian and Hispanic donors (0.178). It is likely, therefore, that the data from the 11th IWS and those from UNOS differ. This is probably due, in part, to the fact that the 11th IWS typings were more recent and with better antigen definition than were some of those in the UNOS registry. Nonetheless, these data are a further indication that the allele distribution among donors is more like that of patients of the same race, with genetic distances of 0.061-0.068, than like that of either patients or donors of different races. The greatest differences were between African-Americans and Caucasoids (0.368) while the differences between Hispanics and either African-Americans (0.177) or Caucasoids (0.178) were intermediate.
The twenty-five most common A;B, B;DR, and A;B;DR haplotypes in each population evaluated are presented in Tables 7-12. These data have several interesting features. Not surprisingly, among the most frequent haplotypes more are shared between donors and recipients of the same race than between groups of different races. However, several haplotypes were shared between different racial groups. When one compares African-American, Hispanic, and Caucasoid donors, there were six A;B haplotypes common to all three groups and eight common to at least two groups. Similarly, for B;DR, seven haplotypes occurred in all three groups and ten in two groups, and for A;B;DR there were four and six haplotypes shared by three and two groups, respectively.
The distribution of alleles among these common haplotypes was mostly as expected considering the allele frequencies. For example, A2 was present in nearly half the haplotypes in all groups. Less-common antigens such as A34 and A36 in African-Americans, A11 and A29 in Caucasoids, and B39 in Hispanics occurred in haplotypes in which linkage disequilibrium4 contributed appreciably to the haplotype frequency. There were some differences between patients and donors in linkage disequilibria. These differences were in the numerical values of the linkage disequilibria and/or relative linkage disequilibria and/or the sign (positive or negative) of the linkage disequilibrium value. Some examples are provided in Table 13. Linkage disequilibrium patterns, for haplotypes with positive linkage disequilibrium values, differed between African-Americans and Caucasoids. Examination of the 100 most common A;B and B;DR haplotypes for donors and patients of both races revealed that, for these haplotypes, the strength of the linkage disequilibrium was lower among African-Americans than among Caucasoids-that is, Δrel values were lower among African-Americans, with 66-82% being <0.1 and 3-7% having Δrel values >0.3 while among Caucasoids, 31-63% of these haplotypes had Δrel values <0.1 and 13-26% had values >0.3. This trend was also true for the A;B;DR haplotypes although the Δrel values were lower, overall.
We examined the distribution of haplotype frequencies among the different groups. The frequency distribution data for A;B, B;DR, and A;B;DR haplotypes and the total number of haplotypes detected in each group are presented in Table 14. This table shows, for each group, the number and percentage of haplotypes occurring within each frequency range. For example, among African-American donors, there were 14 haplotypes with frequencies in the range of 0.01-0.04999, which was 2.5% of the 572 haplotypes observed in that group. For all three categories of haplotype, there is a shift in the frequency distribution among the different racial groups. Caucasoid donors have more haplotypes at the high and low ends of the frequency range than do other groups, and the cumulative frequencies of the haplotypes at the high end of the range were greater for Caucasoids than for other groups. For example, among African-American, Caucasoid, and Hispanic donors, respectively, there are 14, 16, and 15 HLA-A;B haplotypes (Table 7) in the two most frequent categories (frequency ≥ 0.01), with cumulative frequencies of 22.4%, 43.2%, and 30.1%. This trend occurred in both patient and donor groups for the A;B, B;DR, and A;B;DR haplotypes (Table 15).
Examination of the allele distribution in two groups, donors and end-stage renal disease patients, in each of four races revealed differences in distribution similar to that of the distribution of antigens we have reported earlier (13). The frequency of the blank allele, estimated by maximum likelihood method, indicates that the increased frequency of a single antigen at a locus occurring in patients compared with donors is due, in large part, to a higher frequency of the blank allele rather than to an increased frequency of homozygosity. However, the frequencies of the blank allele are probably overestimated in all groups and at all loci, and to a greater extent at the DR locus. We reported previously (13) that the frequency of phenotypic blanks (i.e., phenotypes with a single antigen at a locus) in the yearly cohorts of donors decreased steadily from 1988 to 1992. Similarly, we find that the frequencies of the blank alleles for the donors from 1991-1992 were 20-33% lower than those for donors from 1988-1992, depending on the locus and race (data not shown), most likely due to ongoing improvements in typing capabilities.
Tests for goodness of fit revealed significant deviations from Hardy-Weinberg equilibrium in many cases. These deviations can result from many factors, including misassignment of phenotypes, racial admixture, misclassification of race, consanguinity, and selection. We believe phenotype misassignment contributed substantially to the deviation from fit to HWE. Ongoing identification of new antigens, increasing availability of reagents, and improvements in technology have resulted, over time, in clarifications in some phenotype assignments. Errors in data entry may have produced other misassignments. To test our suspicions, we performed goodness-of-fit tests on data from the African-American and Caucasoid donors typed in 1991-1992. The P values obtained for tests of A, B, and DR were 0.92, 0.26, and 0.50, respectively, for African-Americans and 0.56, 0.06, and 0.04, respectively, for Caucasoids. Five of six of these values are higher than those obtained for the 1988-1992 African-American (0.60, 0.003, 0.04) and Caucasoid (0.11, 0.15, <0.0001) donors (Table 6). This indicates that misassignment of phenotypes was the primary cause of the deviations from HWE, which in turn suggests that periodic assessment of the registry data is likely to yield frequency estimates of increasing accuracy. Admixture may also have contributed to the observed deviations. The racial categories used in the registry are broad groups each containing individuals of different geographic and ethnic ancestry. A third, but unlikely, factor that may have contributed to the deviations from HWE is selection. Aside from the possible protection against severe malaria that has been attributed to certain HLA antigens (17), there is no firm evidence of selection in any of the diseases with known HLA associations. However, we note that the deviation from HWE is greater among patients than among donors of the same race and is greater among Caucasoid patients, many of whom have insulin-dependent diabetes mellitus known to be associated with DR3, DR4, and the DR3,4 genotype. Despite the deviations from HWE observed with these groups, we note that the allele and haplotype frequencies reported here are similar to those reported else-where (16, 18).
The estimated three locus haplotype frequencies correlate with our earlier observations on the distribution of phenotypes (13). That is, among Caucasoids, there are a few phenotypes that are seen multiple times and these phenotypes consist of a few, relatively common haplotypes (i.e., frequencies of 2-5%). Most haplotypes are extremely rare (<0.0001), resulting in a high degree of population heterogeneity at the phenotype level. Phrased differently, it is not surprising that with most haplotypes being extremely rare, most phenotypes would also be rare.
Differences among races in the distribution of haplotypes are important in considerations of access to transplantation. Six antigen matches occur predominantly among Caucasoids because of a few phenotypes that occur repeatedly in this group. However, although more haplotypes are shared between donors and recipients of the same race, some are among the most common in two or three racial groups, representing opportunities for transplants that have no mismatches at two or even three loci for all racial groups. The frequencies of these shared haplotypes are most important. For example, of the 25 most common HLA-A;-B haplotypes in each group, there are 18 shared between African-American donors and African-American patients but only 6 shared between Caucasoid donors and African-American patients. However, the cumulative frequency among African-American donors of the 18 shared haplotypes is 0.2598, while among the Caucasoid donors, the cumulative frequency of the 6 shared haplotypes is 0.2595. This means that the frequency of donors with phenotypes consisting of haplotypes that are common among African-American patients is 6.75% in the African-American donors and 6.65% in Caucasoid donors. This is particularly important for African-Americans who, as a group, have reduced access to transplantation because of a higher level of presensitization to HLA antigens (19-21).
The number of different haplotypes observed-i.e., for which frequencies are estimated-is a function of the sample size, the distribution of haplotype frequencies, and the estimation process. The number of haplotypes theoretically possible in a population of infinite size is equal to the product of the number of alleles occurring at each locus. The sizes of the groups studied here were too small to observe all theoretically possible haplotypes because of the rarity of some haplotypes. For all groups studied here, the numbers of different haplotypes estimated, given as percentages of the number theoretically possible, were: less than 50% for A;B, 50-67% for B;DR, and 31-35% for A;B;DR. While the number of haplotypes increased with increasing population size, the relationship between the size of the population and number of haplotypes was not linear but rather appeared to approach the maximum asymptotically. In an analysis of the NMDP registry data (12) the numbers of A;B and A;B;DR haplotypes detected were 66% and 64%, respectively, of the numbers of haplotypes theoretically possible. However, these percentages were for the combined populations of different races and may overestimate those seen in individual racial groups. Haplotypes may be absent from a sample either because the size of the sample is too small to accommodate all possible haplotypes or because the haplotypes do not exist in the population. The data presented here and by Milford et al. (12) suggest that it is unlikely that all possible haplotypes that may occur in a population will be observed.
The differences, between patients and donors of the same race, in linkage disequilibrium values for certain haplotypes are interesting. However, the population sizes are not sufficiently large to eliminate the possibility that these differences are due to chance. Also, relative linkage disequilibrium values, for alleles with positive associations, tended to be greater among Caucasoids than among African-Americans.
In summary, the data presented here represent allele and haplotype frequencies from large populations of North American Caucasoids and African-Americans and are the first such data for end-stage renal disease patients. While the accuracy of the frequency estimates diminishes for very rare haplotypes, comparisons with similar data from other sources indicate that these data are, overall, reasonable estimates of haplotype frequencies that may be applied for various purposes, such as predictive modeling in transplantation. These data, along with those of our previous report on HLA antigens and phenotypes (13), demonstrate that the large disparity between African-Americans and Caucasoids in the frequency of phenotypically identical cadaver renal transplants is due to the repeated occurrence of a small number of HLA phenotypes among Caucasoid donors and Caucasoid end-stage renal disease patients. All other phenotypes are extremely rare. Certain haplotypes, particularly two locus haplotypes, are common to all racial groups. This means that there is a reasonable chance for some patients of all races to receive a transplant with no mismatched antigens at two loci. This is particularly important to the highly sensitized patient for whom a donor with no HLA-A and HLA-B antigens mismatched may be the only opportunity for a transplant. However, this would require access to the national, or at least regional, donor populations to occur with any appreciable frequency.
This work was supported in part by a contract from the United Network for Organ Sharing.
Abbreviations HWE, Hardy-Weinberg equilibrium; NMDP, National Marrow Donor Program; UNOS, United Network for Organ Sharing.
An explanation of linkage disequilibrium is provided in the Materials and Methods section.
1. Terasaki PI, Tiwari JL. HLA and disease associations. New York: Springer, 1985.
2. Mitchell MS, Harel W, Groshen. Association of HLA phenotype with response to active specific immunotherapy of melanoma. J Clin Oncol 1992; 10: 1158.
3. Egea E, Iglesias A, Salazar M, et al. The cellular basis for lack of antibody response to hepatitis B vaccine in humans. J Exp Med 1991; 173: 531.
4. Hatae K, Kimura A, Okubo R, et al. Genetic control of nonresponsiveness to hepatitis B virus vaccine by an extended HLA haplotype. Eur J Immunol 1992; 22: 1899.
5. Patarroyo ME, Vinasco J, Amador R, et al. Genetic control of the immune response to a synthetic vaccine against Plasmodium falciparum
. Parasite Immunol 1991; 13: 509.
6. Romagnoli P, Takacs B, Kilgus J, Pink JRL, Sinigaglia F. Peptide-MHC interaction: a rational approach to vaccine design. Inter. Rev Immunol 1990; 6: 61.
7. Zachary AA, Braun WE. Calculation of a predictive value for transplantation. Transplantation 1985; 39: 316.
8. Baur MP, Danilovs JA. Population analysis of HLA-A,B,C,DR and other genetic markers. In: Terasaki, PI (ed): Histocompatibility testing 1980. Los Angeles: UCLA Tissue Typing Laboratory, 1980.
9. Hart JM, Zemmour J, Schmeckpeper BJ, et al. The occurrence of HLA-B46 in two Caucasoid families. Tissue Antigens 1993; 41: 47.
10. Page-Bright B. Proving paternity-human leukocyte antigen test. J Forensic Sci 1982; 27: 135.
11. Terasaki PI, Bernoco D, Gjertson D, Mickey MR, Perdue S. Ninety-five percent probability of paternity with HLA, ABO and haptoglobins. Forensic Sci Intern 1978; 12: 227.
12. Milford EL, Mori M, Graves M, Beatty P. HLA gene and haplotype frequencies in the North American population: the National Marrow Donor Program Donor Registry. Transplantation (in press).
13. Leffell MS, Steinberg AG, Bias WB, Machan CH, Zachary AA. The distribution of HLA antigens and phenotypes among donors and patients in the UNOS registry. Transplantation 1994; 58: 1119.
14. Piazza A. Haplotype and linkage disequilibria from the threelocus phenotypes. In, Kissmeyer-Nielsen F (ed): Histocompatibility testing 1975. Copenhagen: Munksgaard, 1975: 923.
15. Cavalli-Sforza LL, Bodmer WF. The genetics of human populations. San Francisco: Freeman. 1971.
16. Imanishi T, Akaza T, Kimura A, Tokunaga K, Gojobori T. Allele and haplotype frequencies for HLA and complement loci in various ethnic groups. In: Tsuji K. Aizawa M, Sasazuki T (eds): HLA 1991: Proceedings of the 11th International Histocompatibility Workshop and Conference, vol. 2. New York: Oxford University, 1992.
17. Hill AVS, Allsopp CEM, Kwiatkowski D, et al. Common West African HLA antigens are associated with protection from severe malaria. Nature 1991; 352: 595.
18. Osborne LC, Mason JM. HLA-A/B haplotype frequencies among US Hispanic and African-American populations. Hum Genet 91: 326.
19. Kallich JD, Adams JL, Barton PL, Spritzer KL. Access to cadaveric kidney transplantation. Santa Monica, CA: RAND, 1993.
20. Sanfilippo FP, Vaughn WK, Peters TG. Factors affecting the waiting time of cadaveric kidney transplant candidates in the United States. JAMA 1992; 267: 247.
21. Thompson JS. American Society for Histocompatibility and Immunogenetics crossmatch study. Transplantation 1995; 59: 1636.