The widespread uptake of next generation sequencing is revolutionizing the genetic information available to gain a better understanding of the aetiology of schizophrenia. It is helpful at this point to review our understanding of the ways in which genetic risk factors may be impacting on risk of disease because insights gained may impact on choices regarding which investigative approaches are likely to be most productive in future.
To begin with, we should summarize a few key findings which have emerged to date. These are deliberately stated in very approximate form and are intended only to be used to inform further discussion. The findings to be noted are as follows:
Schizophrenia is an illness with prevalence of ∼1%. There are occasional pockets of higher prevalence and possibly some correlation with increased latitude and parental consanguinity (Dobrusin et al., 2009; Kinney et al., 2009; Mansour et al., 2010).
Environmental factors, including cannabis use, maternal influenza infection and maternal malnutrition, have a moderate effect on risk (Van Os et al., 2002; St Clair et al., 2005; Brown, 2012).
Genetic factors have a major effect on risk, with monozygotic twin concordance around 40%, whereas risk to dizygotic twins and other first-degree relatives of affected probands is in the region of 10% (Kringlen, 2000).
Although it is possible to find pedigrees with multigenerational, apparently unilineal inheritance, repeated linkage studies have failed to identify regions in which markers clearly and consistently associate with disease (Sullivan et al., 2012). However, there is a single example of a pedigree in which a balanced translocation disrupting DISC1 does segregate through several generations and is associated with increased risk of schizophrenia and other mental disorders (Muir et al., 1995).
Repeated genome-wide association studies (GWAS) using common single nucleotide polymorphisms (SNPs) [minor allele frequency (MAF)>0.01] have failed to identify any markers associated with a substantial effect (odds ratio>1.5) on risk (Ripke et al., 2011; Sullivan et al., 2012).
Analytic methods considering large numbers of SNPs in combination indicate that thousands of markers are very weakly (odds ratio<1.05) associated with risk (Purcell et al., 2009; Sullivan et al., 2012).
Copy number variants (CNVs) at C22q11.21 and a few other locations are consistently associated with a major risk of schizophrenia (Murphy et al., 1999; Itsara et al., 2009; Levinson et al., 2011). A few CNVs are found rarely in schizophrenic individuals and almost never in unaffected individuals and may exert a very substantial effect (Rujescu et al., 2009; Vacic et al., 2011; Sullivan et al., 2012).
In addition to the small number of regions repeatedly found to harbour CNVs, the overall burden of large, rare CNVs is increased in schizophrenic individuals across the genome (ISC, 2008; Sullivan et al., 2012).
Follow-up, targetted association studies of genes implicated by linkage studies, GWAs and chromosomal abnormalities have failed to identify unequivocally variants, which have a major effect on risk.
There is some evidence that de-novo mutations may account for a proportion of cases (Xu et al., 2008; Malhotra et al., 2011; Sullivan et al., 2012).
Implications for genetic architecture
What conclusions may be drawn from these findings about the underlying genetic architecture of schizophrenia? From the start, we should clearly state that of course it is expected that there will be heterogeneity of genetic effects rather than a single mode of transmission which will account for all or even most cases of disease. With that said, we would propose that the following statements are plausible, again emphasizing that any quantitative measures are deliberately intended to be very approximate. This is partly to more clearly present the basic concepts which are to be conveyed and partly because the information available only allows for approximate inferences.
Effect size of variants
The results from GWAS suggest that there are variants associated with a minor effect on risk (Ripke et al., 2011; Sullivan et al., 2012). A crucial question is whether there are any variants which have a major effect. One argument in favour of the existence of such variants is that some chromosomal abnormalities, including both the DISC1 translocation and certain CNVs, are associated with a substantially increased risk (Murphy et al., 1999; Rujescu et al., 2009; Vacic et al., 2011). Also, it is possible to find some densely affected pedigrees which give the appearance of segregating a major gene, although it is always difficult to quantify how much ascertainment can account for this (Kalsi et al., 1999). Finally, from a clinical point of view schizophrenia does not in general present as being a smoothly distributed quantitative trait. Of course there are some individuals in whom the diagnosis is unclear, there are some individuals more severely affected than others and there are some unaffected individuals who may demonstrate some psychotic traits (Rossler et al., 2007). However, in general, schizophrenia fits better to an ‘illness’ model in which a small proportion of the population are substantially affected. It is possible to argue for a liability-threshold model in which an accumulation of small risk factors leads eventually to some catastrophic process, but overall we might well expect that some major risk factors could exert an effect.
Arguably, there is an a priori expectation that there may be some genetic variants exerting a major effect and the question which needs to be asked is whether the results obtained to date exclude this possibility.
Role of de-novo mutations
There is at least suggestive evidence that a proportion of cases may be due to de-novo CNVs (Xu et al., 2008; Malhotra et al., 2011; Sullivan et al., 2012). It is certainly possible that other, less easily detected, de-novo mutations may also confer risk. This leads to the prospect that all of schizophrenia might be due to de-novo mutations. An argument against this is the recurrence of schizophrenia among relatives. There must be at least some transmission of risk or else there would be no familial clustering. A modified version of this hypothesis would be to propose that de-novo mutations occur which substantially increase the risk of schizophrenia and which segregate through families but that they disappear after a few generations through selection effects. This would lead to a situation in which all major risk variants were essentially ‘private’, to families if not to individuals. At time of writing there is perhaps not much evidence against this hypothesis. One could point to the fact that there do seem to be a few geographical regions where rates of illness are especially high and these would indicate the presence of local risk variants persisting over time. Likewise, it does seem that some association findings from GWAs are fairly consistent across large numbers of samples, implying that there are at least some susceptibility alleles which survive in the population. However, it is difficult to rule out the possibility that a large proportion of schizophrenia may be maintained at a relatively constant rate through a dynamic process whereby new mutations appear by chance and then subsequently disappear through selection. If this is the case, one might well imagine that there could be major variants with a strongly deleterious effect which might not persist for long, whereas other variants with a more benign effect might be less affected by selection pressures and might be relatively persistent.
Common variants with major effect
There are probably no individually common (MAF>0.01) variants which exert a substantial effect on risk (Ripke et al., 2011; Sullivan et al., 2012). Any such variants close to SNP markers would have been detected by the GWAS which have been performed. If any common risk variants do exist then they would have to be in regions not covered by standard SNP sets.
Number of major genes
The number of genes having a major effect on risk is unlikely to be very small.
It is reasonable to suspect that even if there were extreme allelic heterogeneity, resulting in only a handful (<6) of genes with major effects, at least one would have been identified by now through the originally conceived approach of linkage studies followed by targeted association studies.
The fact that the overall burden of large CNVs is increased across the genome strongly suggests that there are multiple sites across the genome where disruption can cause a substantial increase in risk (ISC, 2008).
Likewise, the finding that large numbers of SNPs are weakly associated with disease without any obvious clustering also suggests that there cannot be a very small number of major genes (Purcell et al., 2009). Conversely, it is important to point out that the fact that large numbers of SNPs show weak association does not imply that there must be large numbers of genes with individually small effect. To illustrate this, take a simple example in which there are 10 genes influencing susceptibility in each which any of 10 dominant risk variants may increase the probability of affection from 0.01 to 0.1. If each of these variants had a population frequency of 0.0005 then together they would produce a population prevalence of 0.01. If each of these 100 variants were in complete linkage disequilibrium with 20 SNPs each having MAF=0.01 then individuals with the minor allele of each SNP would have risk of schizophrenia of (0.01×(0.1–0.0005)+0.1×0.0005)/0.1=0.01045. Thus, one might observe 2000 SNPs associated with relative risk of 0.01045/0.01=1.045 even if as few as 10 genes with a total of 100 risk variants were involved. This is intended to be just a simple example to illustrate the general point that a small number of rare major risk variants can lead to the finding of large numbers of weakly associated markers. Of course, to match the published findings one would need to add the condition that the SNPs should not be in strong linkage disequilibrium (LD) with each other because such SNPs were excluded from analysis. One could argue that the associated SNPs would tend to cluster round the disease genes, but one should bear in mind that there would be very large numbers of SNPs showing similar levels of association entirely by chance scattered across the genome. Hence, any signal arising from the truly associated SNPs might well not be obvious in the general noise. Although 10 genes and 100 variants might theoretically suffice to produce the observed results, in reality one suspects that the numbers are likely to be higher.
Results to date are probably consistent with the notion that major effects on risk might occur through variants in some tens or hundreds of genes.
Number of risk variants
Although one can conceive that only a modest number of genes could together account for a significant proportion of the genetic risk of schizophrenia, it is likely that the number of individual variants is high, running into the hundreds or thousands. Two lines of argument support this. First, if we take as examples two rare, Mendelian diseases, phenylketonuria and presenile Alzheimer’s disease, we observe that hundreds of different mutations have been identified as pathogenic (Jayadev et al., 2010; Sterl et al., 2012). If this is the case for such rare, phenotypically homogeneous diseases then one might expect that at least as many variants would make contributions to the risk of schizophrenia, which is common and heterogeneous. Second, if there were variants which were individually common then one might expect that some would have been identified by now.
Because of the implications for analytic approaches (as addressed below) it is important to consider whether recessive effects might make a significant contribution to risk.
From a biological point of view, it is plausible that recessive effects could be present. It makes sense that damaging both copies of a gene so that there was no functional product could lead to disease and that indeed this might be more likely to be pathogenic than if just one copy were damaged leading to a lower gene dose through haploinsufficiency.
It could be argued that a recessive mode of transmission is more consistent with the observation that the risk to siblings is less than half the concordance rate for identical twins (Kringlen, 2000).
Extended pedigrees which have been found and used for linkage analysis appear to be more consistent with dominant than recessive modes of transmission, including the pedigree segregating the DISC1 translocation. However, this does not rule out the possibility that recessive effects might be active in other families and indeed it may be the case that some of the pedigrees which appear dominant are in fact segregating recessively acting alleles, with other risk alleles being contributed by carriers marrying in to the pedigree. Without consanguinity, this explanation would only be plausible if carriers were not very rare.
Although schizophrenia is common in societies where consanguinity rates are low, there are some studies which show that the risk is increased in the children of consanguineous marriages (Dobrusin et al., 2009; Mansour et al., 2010).
At this point it is worth explicitly considering a very simple recessive transmission model, which might apply to schizophrenia. We should recognize that, compared with classical recessive disorders such as phenylketonuria or cystic fibrosis, schizophrenia is very common. Suppose we assume that there is a gene such that if it is inactivated then the risk of developing schizophrenia rises to 50%. Suppose that there are a large number of different mutations of this gene present in the population so that each mutation is individually very rare. Suppose that recessive inactivation of this gene in individuals carrying two mutated versions accounts for 10% of cases of schizophrenia. Then we have the following situation: the population prevalence is 0.01; of these cases, 10% have two inactivated copies of this gene, that is 0.001 of the population; as the risk of developing schizophrenia with no working copy of the gene is only 0.5, in fact the proportion of the population actually having two inactivated copies is 0.002; assuming Hardy–Weinberg equilibrium (HWE) the combined frequency of all inactivating mutations in this gene is sqrt(0.002)=0.04; given this allele frequency, the proportion of the population being carriers for a deleterious mutation would be 0.08. Thus, with these example values one can conceive a situation in which 0.1% of the population have schizophrenia because of the recessive inactivation of this gene, whereas 8% are carriers. If the risk of developing disease, that is the penetrance, was lower than 0.5 then the carrier frequency would be even higher. In contrast, if less than 10% of schizophrenia were accounted for by recessive inactivation of this gene then the carrier rate would be lower. Nevertheless, it is clear that it would not be completely implausible to think that at least some pedigrees appearing to demonstrate dominant transmission were in fact manifesting recessive effects, resulting from carriers marrying in to the pedigree.
A hallmark of recessive diseases is that they are commoner in the offspring of consanguineous matings. There is some evidence that this may be the case for schizophrenia (Dobrusin et al., 2009; Mansour et al., 2010). However, it is worth considering what the magnitude of the effect might be. Suppose that there is a gene which if recessively inactivated produces a significantly increased risk of disease and suppose that there are a number of inactivating mutations present in the population, possibly individually rare, having a cumulative frequency of p. If we consider the child of a first-cousin marriage then the probability of the gene inherited from the mother bearing an inactivating mutation is p. The probability that the child inherits the same mutation, identical by descent, from the father is then 1/16. The probability that the father passes on a different version of the gene but that this gene coincidentally also bears an inactivating mutation is p×15/16. Thus, the overall probability for a child of a first-cousin marriage to inherit two inactivated copies of the gene is p(1/16+p×15/16). This compares with the probability of p2 for a child of unrelated parents to have two inactivated copies of the gene. The increase in the risk for a child to be homozygous if their parents are first cousins is thus p(1/16+15p/16)/p2=1/16p+15/16. If we consider a very rare, Mendelian, recessive disease with small p then the risk for children of consanguineous marriages will be increased many times and it will be quite unusual for cases to occur in offspring of couples who are not related. This would be the case for schizophrenia if a susceptibility allele with a recessive effect were rare. However if, from the example above, we set a value of P=0.04 then the relative risk is only of the order of two-fold or so. It is quite possible that such a relatively modest increase in risk for children of consanguineous marriages would not have been obvious from some epidemiological studies. This is especially the case if we consider the possibility that recessive risk factors might be present but might only account for a proportion of cases, so that any increase in parental consanguinity might have been diluted by other mechanisms which might also be active in some individuals, such as dominantly acting risk factors and de-novo mutations.
Summary of implications of results to date
It is not plausible that schizophrenia is caused by a very small number of mutations in a very small number of genes. However, it remains a possibility that there could be some genetic variants which have a substantial effect on risk. Most, perhaps all, of these might be individually rare (P<<0.01) but there might be particular risk genes within which their cumulative frequencies could be higher, especially if some variants acted recessively.
Implications for analytic approaches applied to next generation sequencing data
If one admits the possibility that variants with major effect on risk may exist then it becomes appropriate to consider what approaches might yield the best chance of recognizing them. Whole genome or whole exome sequences are becoming available for hundreds of individuals, providing genotypes for millions of variants across thousands of genes. In this context, the task of a statistical analysis is in the first instance to focus attention on findings of potential interest rather than producing a very accurate, arduously calculated P value. Inevitably, any preliminary results will not stand on their own but will need to be followed up in a number of ways before being judged to be biologically meaningful. Such follow-up would include: checking that genotype calls are accurate; carrying out searches of variant databases; making a judgment about the likely biological consequences of variants; carrying out genotyping in replication samples; functional studies.
It may be expected that no individual variant will be present in more than around 1% of affected individuals. Thus, with samples sizes measured in the hundreds only a few cases will share a given risk variant. Further support for its aetiological involvement might come from finding that it segregated with disease in relatives of these individuals (Curtis, 2011). However, an obvious complementary approach to dealing with rare variants is to analyse them jointly at the level of the gene (Curtis et al., 2008; Madsen and Browning, 2009; Lawrence et al., 2010; Morris and Zeggini, 2010; Li et al., 2011; Curtis, 2012). When this is done they may produce more significant evidence for association than any considered individually. An additional advantage to a gene-wise rather than variant-wise approach to analysis is that there is less ‘noise’ and corrections for multiple testing can be less stringent. With the degree of heterogeneity which we might expect to be present, it is perhaps not out of the question that there might be a difference of up to 5% between cumulative risk allele frequencies within one gene between cases and controls. Such differences might be detectable using a sample size in the region of 1000.
A variety of methods exist to test the general hypothesis that rare, possibly deleterious variants within a gene may be more common in cases than controls. However, it is worthwhile to consider whether there might be some advantage to be gained by specifically testing for recessive effects. To illustrate this question, we can begin with the standard contingency table of genotype counts for a biallelic locus as shown in Table 1. This shows the numbers of individuals having each genotype. A number of standard approaches to analysis have been described previously (Lewis, 2002; Clarke et al., 2011). A natural way to analyse this table is to carry out a Pearson χ2-test in which each expected count is generated as the product of relevance to the row and column total divided by the total number of individuals, N. One then takes ∑(O–E)2/E to yield a χ2 statistic with d.f.=2 to test whether the genotype frequencies differ between cases and controls. An alternative is to carry out an allele-wise test to see whether allele frequencies differ and this utilizes the observed and expected allele counts displayed in Table 2 to produce a χ2 statistic with d.f.=1. One can carry out tests specifically based on the hypotheses that dominant or recessive effects are present by either grouping genotypes containing at least one risk allele (AB and BB) or else grouping those with less than two copies of the risk allele (AA and AB) and these approaches are illustrated in Tables 3 and 4. Both use two-by-two tables to produce a χ2 statistic with d.f.=1. To understand how these methods work in practice, we can take an example of a very rare allele which accounts for the disease in a proportion of cases. Examples of the application of these approaches when a dominant effect is present are shown in Tables 5a–c. The test based on the raw genotype counts produces a χ2 of 10.5 with d.f.=2, but the test with AB and BB genotypes collapsed together produces the same χ2 with only 1 d.f. and hence yields a lower P value. The test using allele counts produces very similar results to the ‘dominant’ genotype test. The examples for a recessive effect are shown in Tables 6a–c. Again, the two genotype tests produce the same χ2 but the one based on the collapsed counts (here AA and AB) has only 1 d.f. and so is more highly significant. However, here we see that the test based on allele counts produces a much higher χ2 and still has only 1 d.f. and so is much more highly significant.
In this simplified example we see that, compared with genotype-wise analysis, allele-wise has similar power for dominant effects and greater power for recessive effects. We can also see that if the same proportion of cases is due to a particular variant then the allele-wise test produces a more highly significant result if the effect is recessive rather than dominant. This is obviously because in the recessive situation each case due to the variant contributes two alleles rather than one. Because of these useful properties of allele-wise tests, when one does a combined analysis of different variants within the same gene a typical approach will involve adding up counts of variant alleles. This may be close to optimal if dominant effects are present and is expected to work well also for recessive effects.
However, although comparing allele counts is clearly a good way of testing for association if recessive effects are present, it is not in fact the best. We can clearly see this if we consider Table 7a and b. One of these is consistent with a recessive effect but one is not, yet both produce identical allele counts, as shown in Table 7c. Thus the comparison of allele counts does not produce a more significant result when the genotype counts themselves provide more support for the presence of a recessive effect on risk.
To understand how we might obtain a test which is better able to detect association when a recessive effect is present we can consider Table 8. Table 8a shows some example genotype counts and Table 8b shows in more detail how the χ2 statistic for the conventional genotype-wise analysis is produced. It is assumed that the genotypes have the same frequency in cases and controls. If both sample sizes are the same then the expected counts will be equal for each genotype. Most of the contribution to the χ2 statistic comes from the counts for the BB genotype in both cases and controls, the count for the cases being higher than expected and for the controls being lower. By contrast, Table 8c shows the situation if we generate expected counts conditional on the observed allele counts and the presence of HWE. Now we see a different range of effects. The expected counts for the BB genotype are now very low for both controls and cases. For controls this means that the observed and expected counts are in fact very similar to each other. For cases, the difference between the observed and expected counts is now very marked and produces a very large value for (O−E)2/E. It is of some interest to note that the expected counts for the AB genotype are higher than the observed count for both controls and cases. Using the expected counts assuming HWE produces a very large value for the overall χ2 statistic, although of course this would not follow the asymptotic distribution because of the very low expected values in the BB cells. However, even the combined contributions from the two AB cells would exceed the total χ2 produced by the standard analysis conditioned on row and column frequencies.
Although these results might at first seem somewhat surprising, on reflection they are readily understood when one considers the genotype counts individually. The conventional test only compares genotype frequencies between cases and controls. However, what the table actually shows is that the number of cases with the BB genotype is in fact far higher than could be expected if the frequencies were equal and if HWE applied. In fact, even considering the cases in isolation it is clear that there is a very marked deviation from HWE with both homozygote genotypes being more frequent than the heterozygote genotype (which in this example is completely absent). Thus, we gain an understanding that if we wish to develop a sensitive test for a recessive effect then we should not only test that the frequency of the variant allele is higher in cases but also that the frequency of the variant homozygous genotype is higher than would be expected under HWE. It is possible to conceive a number of such tests. One might simply take the information from all cells to produce a χ2 statistic with d.f.=4. Alternatively, one might attempt to specifically test that both heterozygote counts were lower than expected, whereas the variant homozygote count was high in cases but low in controls. A parsimonious test with only 1 d.f. would be to consider the observed and expected counts for variant homozygotes in cases against the combined counts for all the other cells.
To investigate the relative performance of different approaches to analysis, they were applied to a range of simulated data sets. To assess whether the statistics conformed well to the expected distribution under the null hypothesis, sets of genotype counts were simulated using different values for MAF and different sample sizes under the assumption of equal allele frequencies in cases and controls and of HWE. To assess the power to detect association a simplified recessive model was used with parameters as follows. The population prevalence of the disease phenotype was set to K; it was assumed that a proportion α of cases were due to a recessive effect at the locus in question; this would mean that the background risk for individuals not homozygous for risk variants at this locus would be K–α; there would be many variants at this locus, each individually rare, having overall allele frequency P (with the simplifying assumption that each copy of the gene contains at most one variant); of these variants only a proportion δ would be pathogenic, the others having a neutral effect on the phenotype; thus the overall allele frequency of pathogenic variants would be PPATH=δP, whereas that of neutral variants would be PNEUT=(1−δ)P. The normal allele would be denoted A and any variant allele would be denoted B, of which a proportion δ would actually be pathogenic. However, the investigator would not know which variants were pathogenic and would treat them all equally. Individuals possessing two pathogenic variants would have an increased risk (penetrance) f of developing the disease. However, individuals with two variant alleles, that is with genotype BB, would have a disease risk of δ2f+(1−δ2) (K−α). Individuals with other genotypes would have a risk K–α. Individuals with two pathogenic variants might be homozygous for one variant or might be compound heterozygotes with copies of two different pathogenic variants. Expected genotype frequencies for cases and controls were calculated under the assumptions that no copy of the gene contained more than one variant and that the A and B alleles could be treated as being in HWE in the population. It is acknowledged that these assumptions might not be valid but the purpose was simply to obtain some example genotype frequencies in which recessive effects might be observable. Simulated data sets of cases and controls were generated from these frequencies using a scheme of sampling with replacement.
Some preliminary simulations were carried out to determine which approach to testing for departure from HWE seemed most appropriate. Overall the best-performing test seemed to be the analysis with d.f.=1 which tested for an excess of the BB genotype among cases and which incorporated Yates’s continuity correction to allow for the small value for the expected count. This test is shown in detail in Table 9. In fact, what this test does is to test whether the proportion of cases with the BB genotype is higher than would be expected given the combined allele counts from cases and controls and can actually be better conceived of as a binomial test. Thus, the χ2-test is used as an approximation to the binomial probability. The test was implemented so that if the expected number of BB cases was very small (<5) then instead, an exact binomial calculation was carried out to obtain a P value.
When applied to the data sets simulated under the null hypothesis, the conventional genotype-wise and allele-wise tests and the ‘recessive’ genotype test conformed well to the expected distribution. The analysis incorporating a test for departure from HWE was slightly anticonservative at small (<10–5) P values but otherwise was somewhat conservative. The tests were then applied to a range of models incorporating a recessive effect using different sets of values for the parameters specified above and the results of simulations are shown in Table 10. A total of 10 000 simulations were carried out for each set of parameters. To give an example of the kinds of genotype frequencies utilized, the weakest genetic model tested, with α=0.03, f=0.2 and δ=0.25, produced AA|AB|BB genotype frequencies of 0.71|0.26|0.02 in controls and 0.69|0.25|0.05 in cases. The results show that with these kinds of effects the genotype-wise tests are more powerful than the allele-wise test and that the genotype-wise test for an excess of the variant homozygotes in cases compared with controls is somewhat more powerful than the general test. However, the test which also incorporates departure from HWE is substantially more powerful than all the conventional tests.
These results show that one can gain very substantial increases in power by applying analytic methods which test whether cases carry two variants more frequently, not only than would be expected from control genotype frequencies but also than would be expected under HWE. In general, such tests have not been widely used to date and one can propose two main reasons for this. The first is that they may provide little benefit when applied to common SNPs which, it is hoped, may tag rarer variants and so because of imperfect LD the SNPs themselves may not have an increase in homozygote genotypes to the same extent as the variants themselves. The second reason relates to the concern that apparent departures from HWE may result from technical artefacts. These include genotyping errors, population structure and LD between variants. If heterozygotes are miscalled as homozygotes then this will produce departures from HWE and indeed departure from HWE in controls is frequently used as a quality control criterion, even though this is exactly what one would expect to observe for a recessively acting variant (Lewis, 2002; Clarke et al., 2011). Population structure can also produce departures from HWE, though such effects might be recognizable through being present at a number of loci and occurring similarly in both controls and cases. If different variants within the same gene are grouped together then a simple analysis will not distinguish individuals possessing two variants, one on each copy of the gene, from individuals with both variants in cis in a single copy. Thus any pair of variants in LD with each other would tend to mimic homozygotes. One might hope that if individual variants were very rare then it would be unlikely for both to occur together but one could not be confident about this, especially as the cumulative frequency might be fairly high so that there might indeed be an appreciable risk of some pair of variants occurring together in the same haplotype. We would expect that LD between variants would indeed lead to results which appeared to indicate a recessive effect. Closer scrutiny of such results might then reveal that the effect was largely driven by particular pairs of variants occurring together, rather than the pairing being random as would be expected under a recessive model. One might also be able to recognize the LD in control individuals.
One issue which has not so far been considered is what exactly we mean when we talk about a ‘variant’ in this context. Sequencing studies identify large numbers of variants which may be more or less rare and which may have a variety of predicted effects. Implicit in the discussions above is that one can somehow identify a set of ‘potentially interesting’ variants and then define genotypes according to whether zero, one or at least two are present. For example, one might restrict attention only to variants producing a change in amino acid sequence. Or one might additionally include those in splice sites, or in promoter regions. Some methods of analysis categorize variants on the basis of being rare but a note of caution should be sounded here. If recessive effects are important then it is possible that some individual variants might in fact be fairly common and might still have escaped attention in previous GWAS. Some previous methods have attempted to apply weighting schemes to different variants according to their predicted effect and their rarity but it is not immediately obvious how such methods could incorporate tests for recessive effects (Curtis et al., 2008; Madsen and Browning, 2009; Lawrence et al., 2010; Morris and Zeggini, 2010; Li et al., 2011; Curtis, 2012). Instead, all we can suggest is that one would need to define in advance criteria for the kind of variant to be incorporated into the analysis.
Results to date suggest that there are no common dominantly acting variants which have a substantial effect on the risk of schizophrenia. It remains a possibility that there might be individually rare risk variants which might be detectable when analysed jointly at the level of a gene. Epidemiological data are not incompatible with the hypothesis that a proportion of schizophrenia might be due to recessive effects. Relatively common recessively acting variants with major effect size might have been overlooked by GWAS to date because they might have failed QC procedures using HWE or because conventional methods of analysis were not sufficiently powerful to detect them.
A number of recommendations would follow. Efforts should be made to develop genotyping and sequencing technologies which reliably distinguish homozygotes from heterozygotes so that genotype calls can be relied upon to identify true homozygotes. There would be some benefit if these methods could produce phased haplotypes because then one could distinguish compound heterozygotes from pairs of variants in LD with each other occurring in the same haplotype. Methods of analysis which combine information from many individually rare variants should be applied. Methods of analysis should be developed that can detect recessive effects manifest as departures from HWE as well as through producing differences between case and control allele frequencies.
The author thanks Nicholas Bass who challenged me regarding the possible role of recessive effects in the aetiology of schizophrenia.
Conflicts of interest
There are no conflicts of interest.
Brown AS. Epidemiologic studies of exposure to prenatal infection and risk of schizophrenia and autism. Dev Neurobiol. 2012;17:1272–1276
Clarke GM, Anderson CA, Pettersson FH, Cardon LR, Morris AP, Zondervan KT. Basic statistical analysis in genetic case–control studies. Nat Protoc. 2011;6:121–133
Curtis D. Assessing the contribution family data can make to case–control studies of rare variants. Ann Hum Genet. 2011;75:630–638
Curtis D. A rapid method for combined analysis of common and rare variants at the level of a region, gene, or pathway. Adv Appl Bioinform Chem. 2012;5:1–9
Curtis D, Vine AE, Knight J. A simple method for assessing the strength of evidence for association at the level of the whole gene. Adv Appl Bioinform Chem. 2008;1:115–120
Dobrusin M, Weitzman D, Levine J, Kremer I, Rietschel M, Maier W, Belmaker RH. The rate of consanguineous marriages among parents of schizophrenic patients in the Arab Bedouin population in Southern Israel. World J Biol Psychiatry. 2009;10:334–336
. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature. 2008;455:237–241
Itsara A, Cooper GM, Baker C, Girirajan S, Li J, Absher D, et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet. 2009;84:148–161
Jayadev S, Leverenz JB, Steinbart E, Stahl J, Klunk W, Yu CE, Bird TD. Alzheimer’s disease phenotypes and genotypes associated with mutations in presenilin 2. Brain. 2010;133:1143–1154
Kalsi G, Mankoo B, Curtis D, Sherrington R, Melmer G, Brynjolfsson J, et al. New DNA markers with increased informativeness show diminished support for a chromosome 5q11–13 schizophrenia susceptibility locus and exclude linkage in two new cohorts of British and Icelandic families. Ann Hum Genet. 1999;63:235–247
Kinney DK, Teixeira P, Hsu D, Napoleon SC, Crowley DJ, Miller A, et al. Relation of schizophrenia prevalence to latitude, climate, fish consumption, infant mortality, and skin color: a role for prenatal vitamin D deficiency and infections? Schizophr Bull. 2009;35:582–595
Kringlen E. Twin studies in schizophrenia with special emphasis on concordance figures. Am J Med Genet. 2000;97:4–11
Lawrence R, Day-Williams AG, Elliott KS, Morris AP, Zeggini E. CCRaVAT and QuTie – enabling analysis of rare variants in large-scale case control and quantitative trait association studies. BMC Bioinformatics. 2010;11:527
Levinson DF, Duan J, Oh S, Wang K, Sanders AR, Shi J, et al. Copy number variants in schizophrenia: confirmation of five previous findings and new evidence for 3q29 microdeletions and VIPR2 duplications. Am J Psychiatry. 2011;168:302–316
Lewis CM. Genetic association studies: design, analysis and interpretation. Brief Bioinform. 2002;3:146–153
Li MX, Gui HS, Kwan JS, Sham PC. GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet. 2011;88:283–293
Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009;5:e1000384
Malhotra D, Mccarthy S, Michaelson JJ, Vacic V, Burdick KE, Yoon S, et al. High frequencies of de novo CNVs in bipolar disorder and schizophrenia. Neuron. 2011;72:951–963
Mansour H, Fathi W, Klei L, Wood J, Chowdari K, Watson A, et al. Consanguinity and increased risk for schizophrenia in Egypt. Schizophr Res. 2010;120:108–112
Morris AP, Zeggini E. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol. 2010;34:188–193
Muir WJ, Gosden CM, Brookes AJ, Fantes J, Evans KL, Maguire SM, et al. Direct microdissection and microcloning of a translocation breakpoint region, t(1;11)(q42.2;q21), associated with schizophrenia. Cytogenet Cell Genet. 1995;70:35–40
Murphy KC, Jones LA, Owen MJ. High rates of schizophrenia in adults with velo-cardio-facial syndrome. Arch Gen Psychiatry. 1999;56:940–945
Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752
Ripke S, Sanders AR, Kendler KS, Levinson DF, Sklar P, Holmans PA, et al. Genome-wide association study identifies five new schizophrenia loci. Nat Genet. 2011;43:969–976
Rossler W, Riecher-Rossler A, Angst J, Murray R, Gamma A, Eich D, et al. Psychotic experiences in the general population: a twenty-year prospective community study. Schizophr Res. 2007;92:1–14
Rujescu D, Ingason A, Cichon S, Pietilainen OP, Barnes MR, Toulopoulou T, et al. Disruption of the neurexin 1 gene is associated with schizophrenia. Hum Mol Genet. 2009;18:988–996
St Clair D, Xu M, Wang P, Yu Y, Fang Y, Zhang F, et al. Rates of adult schizophrenia following prenatal exposure to the Chinese famine of 1959-1961. JAMA. 2005;294:557–562
Sterl E, Paul K, Paschke E, Zschocke J, Brunner-Krainz M, Windisch E, et al. Prevalence of tetrahydrobiopterine (BH4)-responsive alleles among Austrian patients with PAH deficiency: comprehensive results from molecular analysis in 147 patients. J Inherit Metab Dis. 2012;10:1007
Sullivan PF, Daly MJ, O’Donovan M. Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nat Rev Genet. 2012;13:537–551
Vacic V, Mccarthy S, Malhotra D, Murray F, Chou HH, Peoples A, et al. Duplications of the neuropeptide receptor gene VIPR2 confer significant risk for schizophrenia. Nature. 2011;471:499–503
Van Os J, Bak M, Hanssen M, Bijl RV, De Graaf R, Verdoux H. Cannabis use and psychosis: a longitudinal population-based study. Am J Epidemiol. 2002;156:319–327
Xu B, Roos JL, Levy S, Van Rensburg EJ, Gogos JA, Karayiorgou M. Strong association of de novo copy number mutations with sporadic schizophrenia. Nat Genet. 2008;40:880–885