Exome sequencing in a Swedish family with PMS2 mutation with varying penetrance of colorectal cancer: investigating the presence of genetic risk modifiers in colorectal cancer risk

Objective Lynch syndrome is caused by germline mutations in the mismatch repair (MMR) genes, such as the PMS2 gene, and is characterised by a familial accumulation of colorectal cancer. The penetrance of cancer in PMS2 carriers is still not fully elucidated as a colorectal cancer risk has been shown to vary between PMS2 carriers, suggesting the presence of risk modifiers. Methods Whole exome sequencing was performed in a Swedish family carrying a PMS2 missense mutation [c.2113G>A, p.(Glu705Lys)]. Thirteen genetic sequence variants were further selected and analysed in a case-control study (724 cases and 711 controls). Results The most interesting variant was an 18 bp deletion in gene BAG1. BAG1 has been linked to colorectal tumour progression with poor prognosis and is thought to promote colorectal tumour cell survival through increased NF-κB activity. Conclusions We conclude the genetic architecture behind the incomplete penetrance of PMS2 is complicated and must be assessed in a genome wide manner using large families and multifactorial analysis.


Introduction
Lynch syndrome is caused by germline mutations in the mismatch repair (MMR) genes MLH1, MSH2, MSH6, PMS2 and EPCAM. It is characterised by early onset, familial accumulation of cancer, predominantly colorectal cancer (CRC) and endometrial cancer (EC), but also a wide range of tumours such as gastric, small bowel, ovarian, pancreatic and uroepithelial cancer. Initially, the PMS2 gene was not considered to cause Lynch syndrome due to low penetrance of cancer compared to other MMR genes (Liu et al., 2001). Biallelic PMS2 mutations have been associated with early onset cancer in sites such as the brain and the colorectum (De Rosa et al., 2000). It was, therefore, suggested that mutations in the PMS2 gene are inherited in an autosomal recessive fashion (de Vos et al., 2005) and that biallelic germline inactivation is required for cancer predisposition (Nakagawa et al., 2004). It has later been shown (Hendriks et al., 2006) that heterozygous mutations in PMS2 cause Lynch syndrome in families where the analysis for MLH1, MSH2 and MSH6 have been negative for Lynch syndrome. However, the penetrance of cancer in PMS2 carriers is still not fully elucidated. A recent register paper (Møller et al., 2017) found that PMS2 mutation carriers were at a small increased risk for CRC and EC but not at significant risk for any other Lynch syndrome-associated cancer. Heterozygous PMS2 mutation carriers were at increased risk for CRC (cumulative risk to age 80 years of 13% for males and 12% for females) and endometrial cancer (13%), compared with the general population (6.6, 4.7 and 2.4%, respectively) (Ten Broeke et al., 2018). CRC risk has been displayed to vary widely between members of families having PMS2 mutations, demonstrated by the wide age range of initial CRC diagnosis (mean 52 years; range 26-86 years) as well as the difference in the mean age of CRC (P < 0.001) between probands with PMS2 mutation (mean 47 years; range 26-68 years) and other family members with a PMS2 mutation (mean, 58 years; range, 31-86 years) (Ten Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal's website (www.eurjcancerprev.com).
This is an open access article distributed under the Creative Commons Attribution License 4.0 (CCBY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Broeke et al., 2015). This heterogeneity in risk is consistent with the lifetime CRC risk distribution for MLH1 carriers [34% (25-50%, 95% confidence interval {CI}) for males and 36% (25-51%, 95% CI) for females] and MSH2 carriers [47% (36-60%, 95% CI) for males and 37% (27-50%, 95% CI) for females] (Dowty et al., 2013), suggesting the presence of risk modifiers. These modifiers can be both genetic variations and lifestyle factors. To date, two main classes of cancer susceptibility variants have been detected: rare moderate-penetrance variants with risk allele frequencies <2% and odds ratios (OR) >2.0 and common low-penetrance alleles with risk allele frequencies >5% and OR <1.5 (Sud et al., 2017). However, it is considered likely that the spectrum of penetrance and frequency of risk alleles for many cancers occurs on a continuum. This would infer the possibility for any potential genetic modifiers to vary in both frequency and effect size (Sud et al., 2017). Lynch syndrome has been estimated to account for 2.2% of all cases of CRC (Hampel et al, 2005). In Sweden, CRC is the third most common cancer with approximately 6000 cases annually (Socialstyrelsen, 2018). It is estimated that the narrow sense heritability, the component of phenotypic variance that is accounted for by additive genetic variance, in the Swedish population is 13 and 12% for colon and rectal cancer, respectively. The estimated narrow sense heritability explained by common (>1%) genome-wide association studies (GWAS) CRC SNPs (single nucleotide polymorphism) is 0.65%, whereas all common GWAS SNPs are estimated to explain 7.42% (Jiao et al, 2014).
The Swedish family studied here carries a PMS2 missense mutation [c.2113G>A, p.(Glu705Lys)] (Miyaki et al, 1997;Lagerstedt-Robinson et al, 2007). This mutation has been found in other families with Lynch syndrome with similar clinical phenotype and does not segregate perfectly in families (Lagerstedt-Robinson et al, 2016). The studied family displays high penetrance for CRC in PMS2 carriers, as well as healthy carriers and noncarriers with late-onset CRC. Because of this obvious variation in penetrance, it is suitable for investigating the possibility of genetic risk modifiers. Whole exome sequencing was performed on five family members to identify potential genetic modifiers.

Objectives
To investigate the presence of genetic risk modifiers in a Swedish family with a PMS2 missense mutation [c.2113G>A, p.(Glu705Lys)] (Miyaki et al, 1997;Lagerstedt-Robinson et al, 2007) using whole exome sequencing data.

Family 119
Whole exome sequencing was performed in the individuals I:V, II:I, III:II, II:V and II:II of family 119 ( Fig. 1) on genomic DNA isolated from blood samples. As shown in Fig. 1, II:II does not carry the PMS2 mutation and got cancer at age 72 years, whereas the rest of the family members all carry the PMS2 mutation. II:IV got cancer at age 79 years. I:V got cancer at age 61 years and his two sons II:I and III:II at age 53 years.

Exome sequencing of colorectal cancer samples
Genomic DNA was quantified using a Qubit Fluorometer (Life Technologies, Carlsbad, CA, USA). Sequencing libraries were prepared according to the TruSeq DNA Sample Preparation Kit EUC 15005180 or EUC 15026489 (Illumina, San Diego, CA, USA). Briefly, 1-1.5 µg of genomic DNA was fragmented using the Covaris 400 bp protocol (Covaris Inc, Woburn, MA, USA). After fragmentation, all samples were subjected to end-repair, A-tailing and adaptor ligation of Illumina Multiplexing PE adaptors. An additional gel-based size selection step was performed and the adapter-ligated fragments were subsequently enriched by PCR followed by purification using Agencourt AMPure Beads (Beckman Coulter, Pasadena, CA, USA). Exome capture was performed by pre-pooling equimolar amounts and performing enrichment in 5-or 6-plex reactions according to the TruSeq Exome Enrichment Kit Protocol (EUC 15013230). Library size was checked on a Bioanalyzer High Sensitivity DNA chip (Agilent Technologies, Santa Clara, CA, USA) whereas concentration was calculated by quantitative PCR. The pooled DNA libraries were clustered on a cBot instrument (Illumina) using the TruSeq PE Cluster Kit v3. Paired-end sequencing was performed for 100 cycles using a HiSeq 2000 instrument (Illumina) with TruSeq SBS Chemistry v3, according to the manufacturer's protocol. Base calling was performed with real time analysis (RTA) (1.12.4.2 or 1.13.48) and the resulting binary base call (BCL) files were filtered, de-multiplexed and converted to FASTQ format using CASAVA 1.7 or 1.8 (Illumina). The sequencing was performed at an average coverage of 100×.

Variant selection
Variants obtained from sequencing were first filtered using the minor allele frequencies (MAFs) ≤0.2 in population databases (SweGen, gnomAD). Only the resulting nonsynonymous coding exonic variants were investigated (see Supplementary Table A1 and A2). Because II:IV got cancer at age 79 years, whereas the other carriers had earlier onset, a scenario where the remaining four family members share a genetic modifier that II:IV lacks was hypothesised (scenario 1). A second scenario where all sequenced family members share a genetic modifier that the sons to II:V (III:I and III:II) are heterozygous for was also investigated (scenario 2). The homozygous alternative was not investigated because inbreeding is unlikely and the probability thereby low of the sons being of such zygosity. The criteria for scenario 1 and 2 were then selected for generating two different result sets. The genes present in these result sets were investigated by literature search using the databases Pubmed (www.ncbi. nlm.nih.gov/pubmed) and Genecards (www.genecards. org). The genes previously reported to be associated with CRC development, other cancer development or genes influencing replication, transcription (e.g. transcription factor binding), translation, apoptosis, T-cell signalling, immune modulation, growth factor regulation, tumour suppressor ability or act as metastasis drivers were selected and grouped. All other genes containing variants were disregarded, a subset was of unknown importance, whereas the remainder were considered likely not to be influencing (for example olfactory receptors, such as OR10J1, see Supplementary Table A3, appendix section, Supplemental digital content 1, http://links.lww.com/ EJCP/A366).
The presence of the sequence variants in the selected genes was evaluated in 66 previously investigated CRC families. One previously investigated individual was randomly selected from each family and the presence of each specific sequence variant in any zygosity state was reported. Two quotas were also calculated by dividing the frequency in the CRC families with the largest reported Pedigree of family 119².²Family member II:IV got cancer at age 79 years. The other mutation carriers had earlier onset cancer. A scenario where the remaining four family members share a genetic modifier that II:IV lacks was hypothesised (scenario 1). A second scenario where all sequenced family members share a genetic modifier that the sons to II:V (III:I and III:II) are heterozygous for was also investigated (scenario 2). frequency in the mentioned databases and with the SweGen frequency. The variants found to have a quota above 1.10 and thereby be more common in the CRC families and displaying a frequency of at least 0.5 % in SWEGEN were selected. After these filtering steps, the variants in the first scenario were evaluated using association analysis in 724 cases and 711 controls (Table 1).

Association analysis
TaqMan was performed for 12 of the 13 selected candidate sequence variants. One of the variants (BAG1, rs574287414), consisting of a deletion of 18 bp was analysed using fragment length analysis. The 724 cases and 711 controls were divided into two 384 plates each. TaqMan was performed as previously described. The amount of DNA per well was between 100 and 250 ng (Supplementary Table A4, appendix section, Supplemental digital content 1, http://links.lww.com/ EJCP/A366). Fragment analysis was performed using 250 ng DNA per reaction (Supplementary Table A5, appendix section, Supplemental digital content 1, http:// links.lww.com/EJCP/A366). PCR-products were run on an ABI 3130 genetic analyzer and 5 µl Milli-Q purified water was added to each well and the plate was centrifuged for 20 s. Sizing of the PCR fragments was performed by KI Gene, Center for Molecular Medicine, Karolinska Institute, Solna, Sweden.

Results
In total five individuals within the family were selected for whole exome sequencing; two brothers with high age and cancer with (II:IV) and without (II:II) the PMS2 mutation in addition to three individuals with early onset CRC and the PMS2 mutation (II:V, III:I and III:II). It was hypothesised that II:V, II:I and III:I who has early onset CRC and the PMS2 mutation should have the modifier, as well as II:II who has late onset but lacks the PMS2 mutation. II:IV who has late onset as well as the PMS2 mutation should lack the modifier. The researchers identified 13 candidate variants which fulfil these criteria in genes HMGXB3, PSMD2, TMEM123, ANKHD1-EIF4EBP3, TIMD4, ZC3H12D, HOXA1, HOXA7, AOC1, FGL1, GEM, BAG1 and NDOR1 (Table 1). The following variants could not be evaluated due to technical difficulties: rs142114383 (HMGXB3), rs11545169 (PSMD2), rs11547915 (TMEM123) and rs78410337 (HOXA7). The SNP rs78410337 had two underlying SNPs in the probe design area, why two different probes were used for evaluation. No significant effects were found for any controlled variant. Odds ratios above 1 were observed for rs6873053 (TIMD4), rs62587579 (NDOR1), rs61997220 (ZC3H12D) and rs574287414 (BAG1) ( Table 1).

Discussion
The Swedish family investigated here, family 119, harbours many CRC cases as well as a PMS2 missense mutation [c.2113G>A, p.(Glu705Lys)] (Fig. 1). Because the PMS2 mutation does not segregate perfectly with the CRC within the family, a modifying genetic effect is plausible. This makes the family suitable for studying the incomplete penetrance of PMS2.
The selected family has a promising structure for studying the incomplete penetrance of PMS2 and the possibility of a genetic modifier because it contains both healthy individuals and individuals with the disease of various ages, both with and without PMS2 mutations present. If such a modifier as the one hypothesised here does exist, After these filtering steps, the variants in the first scenario were evaluated using association analysis in 724 cases and 711 controls (see Table 1).
it should likely be found. However, no significant effects were found in this study.
The ORs of importance are 1.11, 1.13, 1.14 and 1.22 for rs68730353, rs62587579, rs61997220 and rs574287414, respectively (Table 1). We note that these ORs are low. rs68730353 is in gene TIMD4, which has been found to be cancer-related (Yano et al., 2017). The gene coding for NDOR1 (rs62587579) has been found to be upregulated in colonic adenoma cells (Paine et al., 2000) and ZC3H12D (rs61997220) has been reported to be a possible tumour suppressor gene (Minagawa et al, 2007). The 18 bp deletion in BAG1 (rs574287414) displays the highest OR of 1.22. BAG1 (BAG cochaperone 1) is a protein-coding gene and binds to the oncogene BCL2. BCL2 is an anti-apoptotic membrane protein and BAG1 enhances the antiapoptotic effects of BCL2 (NCBI's Gene, 2022). Numerous protein isoforms are encoded by BAG1 mRNA. Nuclear BAG1 expression has been linked to a poor prognosis in CRC and may be used as a predictive factor for distant metastasis by means of immunohistochemical staining (Kikuchi et al., 2002). Survival tended to be shorter for patients with BAG1 positive nuclei in tumours than those without nuclear BAG-1-staining (P = 0.011). By using gamma radiation or a vitamin D analogue to induce apoptosis, it has been demonstrated that apoptosis was preceded by a decrease in nuclear and an increase in cytoplasmic BAG1 expression (Barnes et al., 2005). The expression of the nuclear BAG-1L isoform enhanced cellular survival after induced apoptosis. Cytoplasmic BAG-1S fused with a nuclear localisation signal which seemed to protect cells against induced apoptosis. Knockdown of BAG-1 in a CRC cell line has been demonstrated to inhibit NF-κB (nuclear factor-κB) transcriptional activity (Clemo et al., 2008). It has therefore been proposed that BAG-1 promotes colorectal tumour cell survival through increased NF-κB activity. A link between BAG1 and TGFB1 has also been shown; induction of BAG-1L caused suppression of TGFB1 mRNA in colorectal tumour cells (Skeen et al., 2013). Even though the 18 bp deletion is likely debilitating to the protein structure, regulatory relationships are often complicated, and one cannot exclude that the effect may be driving in combination with certain genetic backgrounds (Huang et al., 2014); it has not been explored how the BAG1 gene interacts with the PMS2 missense mutation in this family.
The four variants that could not be evaluated due to technical difficulties, rs142114383, rs11545169, rs11547915 and rs78410337, are related to carcinogenic processes. rs11545169 is in gene PSMD2 and may be related to the tumour necrosis factor signalling pathway (Tsurumi et al., 1996). rs142114383 in HMGXB3, which suppression is found to result in tumorigenesis in the colorectal epithelium (Sun et al., 2017). TMEM123 (rs11547915) is proposed to function as a cell surface receptor that mediates cell death (Huang et al., 2013) and rs78410337 in gene HOXA7 is related to acute myeloid leukaemia (Luo et al., 2018).
A possible explanation for why the selected variants do not display significant effects is that they do not impact the penetrance of CRC. Likely the relationship between CRC and the incomplete penetrance are much more complicated than hypothesised here. Even if effects were detected, there is a limit to their ability in explaining CRC heritability. Because the variants studied were considered in combination only with one PMS2 variant, this investigation is unlikely to explain much more of the complex heritability of CRC in the Swedish family studied here. Furthermore, only coding variants were considered, whereas most of the variation is situated within noncoding regions.

Limitations
Limitations include the small sample size and the rarity of the selected variants in the Swedish population.

Conclusion
In this study, whole exome sequencing was performed in a Swedish family carrying a PMS2 missense mutation [c.2113G>T;A, p.(Glu705Lys)]. Thirteen genetic sequence variants were further selected and analysed in a case-control study (724 cases and 711 controls). The variants with the highest ORs were in the genes TIMD4, NDOR1, ZC3H12D and BAG1. The most interesting sequence found was an 18 bp deletion rs68730353 in gene BAG1 and may play a role in colorectal tumorigenesis. It is not known how the BAG1 gene interacts with the PMS2 mutation in the Swedish family. The heritability of CRC is likely more complex and needs further work needs to be done to elucidate this concept. Further theoretical development and a better understanding of molecular biology are likely needed to reach the next major breakthrough in the heritability of CRC.