Nucleic Acids Research
Home Mutagenic mechanisms of cancer-associated DNA polymerase ϵ alleles
Mutagenic mechanisms of cancer-associated DNA polymerase ϵ alleles
Mutagenic mechanisms of cancer-associated DNA polymerase ϵ alleles

Article Type: Research Article Article History
  • Altmetric
Abstract

A single amino acid residue change in the exonuclease domain of human DNA polymerase ϵ, P286R, is associated with the development of colorectal cancers, and has been shown to impart a mutator phenotype. The corresponding Pol ϵ allele in the yeast Saccharomyces cerevisiae (pol2-P301R), was found to drive greater mutagenesis than an entirely exonuclease-deficient Pol ϵ (pol2–4), an unexpected phenotype of ultra-mutagenesis. By studying the impact on mutation frequency, type, replication-strand bias, and sequence context, we show that ultra-mutagenesis is commonly observed in yeast cells carrying a range of cancer-associated Pol ϵ exonuclease domain alleles. Similarities between mutations generated by these alleles and those generated in pol2–4 cells indicate a shared mechanism of mutagenesis that yields a mutation pattern similar to cancer Signature 14. Comparison of POL2 ultra-mutator with pol2-M644G, a mutant in the polymerase domain decreasing Pol ϵ fidelity, revealed unexpected analogies in the sequence context and strand bias of mutations. Analysis of mutational patterns unique to exonuclease domain mutant cells suggests that backtracking of the polymerase, when the mismatched primer end cannot be accommodated in the proofreading domain, results in the observed insertions and T>A mutations in specific sequence contexts.

Herzog,Alonso-Perez,Salguero,Warringer,Adams,Jackson,and Puddu: Mutagenic mechanisms of cancer-associated DNA polymerase ϵ alleles

INTRODUCTION

Aneuploidies or large genomic rearrangements are a common feature of many types of cancer, but widespread hypermutation—the extensive accumulation of single nucleotide variants (SNVs) or small insertion/deletions (INDELs)—is comparatively less frequent (17% of adult cancers analysed in (1)). Hypermutation usually arises from exposure to mutagens, such as ultra-violet light or tobacco smoke, and/or from hereditary or acquired DNA repair defects leaving a trace of specific mutation signatures (2). DNA mismatch repair (MMR) inactivation, for example, has long been known to drive somatic hypermutation that leads to hereditary colorectal adenocarcinomas (3). More recently, specific mutator alleles altering the exonuclease (proofreading) domain of replicative DNA polymerases δ and ϵ (Pol δ and Pol ϵ), such as POLE-P286R, were found to foster SNV hyper-mutation in the presence of functional MMR, and drive the development of hereditary colorectal or endometrial cancers (4–6).

The proofreading activity of B-family DNA polymerases (such as Pol δ and Pol ϵ) is triggered by the presence of a base-pair mismatch between the template and the nascent DNA strand at the primer-template junction, originating from base mis-insertion or from a damaged template base (7,8), and, possibly, by other impediments to elongation. In these situations, the 3′ end of the primer is melted and moved to the spatially separate proofreading domain, where one or more nucleotides are exonucleolytically degraded (9–11). The primer end is then returned to the polymerase domain, where DNA synthesis can continue (12,13).

The origin of hypermutation in cancer cells with Pol ϵ proofreading domain mutants was originally ascribed to inactivation of Pol ϵ exonuclease activity (4). Subsequent work in the yeast S. cerevisiae, however, revealed that pol2-P301R—the ortholog of the human Pol ϵ mutation POLE-P286R—drives substantially greater mutagenesis than Pol ϵ exo (14), an exonuclease-deficient variant of Pol ϵ encoded by the yeast pol2–4 allele (15). This ultra-mutator phenotype indicates that inactivation of the exonuclease activity is not primarily at the origin of the massive mutation accumulation observed in pol2-P301R cells; and accordingly, biochemical work revealed that exonuclease activity is still detectable in the P301R mutant polymerase (∼20% to ∼60% of wild-type, depending on the assay) (16). Further structural analyses revealed that this amino acid residue change creates a barrier at the entrance of the exonuclease domain, possibly preventing the newly synthesized strand from accessing it (17). Inability to position the mismatched primer in the proofreading domain, and the observed increased mismatch extension ability, could explain the ultra-mutagenic phenotype, but in vitro polymerase assays failed to recapitulate ultra-mutagenesis (16). These observations open up the possibility that other cellular processes may be involved in the generation of mutations and motivated the studies we describe herein.

MATERIALS AND METHODS

Yeast strains and plasmids

Sequence alignment with Clustal Omega version 1.2.1 was carried out to determine the Saccharomyces cerevisiae residues orthologous to human Pol ϵ residues mutated in the literature analysed. Uniprot sequences used for alignment were Homo sapiens POLE (Q07864) and S. cerevisiae Pol2 (P21951). All S. cerevisiae strains used were derived from the laboratory strain W303 (leu2–3,112 trp1–1 can1–100 ura3–1 ade2–1 his3–11,15 RAD5). Polymerase mutant alleles were created by cloning an N-terminal POL2 PCR fragment into pRS306 and generating the mutations of interest by site directed mutagenesis using the QuickChange Lightning Kit following manufacturer's instructions (Agilent Technologies). Polymerase mutant alleles were introduced into MATa haploid S. cerevisiae W303 strains, resulting in a full-length copy carrying the mutation and a non-mutated, truncated N-terminal fragment. Haploid pol2 mutants were then mated to a wild-type isogenic MATalpha strain to generate heterozygous diploid mutant strains. Deletion of MSH2 was introduced in wild-type W303 by one-step gene disruption. Disruptions were confirmed by PCR and whole-genome sequencing. Haploid double mutants pol2 msh2Δ were recovered by mating haploid pol2 with haploid msh2Δ strains, sporulation, tetrad dissection and analysis. The genotypes of strains are described in Supplementary Table S1.

Growth rate and mutation assays

The growth rates of heterozygous diploid polymerase mutant strains were assessed by growing cultures to stationary phase, diluting them into rich medium and growing for 450 min. Growth was assayed by measuring absorbance at 595 nm wavelength. To determine forward mutation rates at the LYP1 locus, single colonies were excised from agar plates with scalpels, inoculated in rich medium, and grown to saturation. Cell cultures were subsequently diluted 1:100,000 and plated on YPAD (1% yeast extract, 2% peptone, 2% glucose, 40 mg/l adenine) plates, or plated (50–250 μl) without dilution on SD (Synthetic Defined media) –LEU +Thialysine (50 mg/l) plates. Thialysine kills cells expressing an active Lyp1 transporter. Different amounts were used for different strains to obtain countable plates. Mutation rates were calculated as described in (18).

Single-cell bottleneck propagation

Heterozygous diploid strains were grown on solid media at 30°C and propagated for 26 passages, every 2–3 days, through single-cell bottlenecks by means of repetitive isolation and single-colony picking. Cells were estimated to have undergone ∼20 generations per passage to generate a colony of 106 cells (∼500 generations during the entire propagation). Colonies were picked randomly to avoid bias towards adaptive or deleterious mutations (with the exception of lethal mutations). In each experiment, each strain was propagated in 18 parallel lines. Deviations from this number due to failures to sequence or to confirm the presence of the mutation at the end of the propagation are denoted in the relevant figures. Haploid strains were propagated for 13 passages. Standard YPAD non-selective rich medium was used. In each experiment, whole-genome sequencing of two random colonies for each strain was attempted at passage 0 and at the end of the propagation. Only mutations observed in both colonies (where a second colony was available), and absent from passage 0, were retained for further analysis.

Small-population bottleneck propagation

Haploid and heterozygous diploid polymerase mutant strains were propagated in a 1536 plate format in a non-selective complete synthetic medium (0.14% YNB, 0.5% ammonium sulphate, 0.077% complete supplement mixture [ForMedium], 2% (w/v) glucose and pH buffered to pH 5.8 with 1% (w/v) succinic acid). Plates were replicated using a ROTOR Robot (Singer Ltd, UK) and 1536 short pin pads every 2–3 days for 40 passages, through bottlenecks estimated to contain ∼104 cells. Effective population size is, however, likely to be smaller due to the population structure of cells in a colony. The number of cells in the bottleneck was calculated by estimates of pixel intensities using light transmission and conversion of pixel intensities into cell counts by calibration to a flow cytometry-based reference (19). In these conditions, wild-type BY strains undergo ∼6 doublings per growth cycle (passage) suggesting that cells underwent ∼240 generations over the duration of the experiment. Final populations were streaked for single colonies and the whole genome of 18–26 isolates per strain was sequenced. After reassigning strains to the correct genotypes and ploidy 12–38 isolates per strain were analysed.

DNA extraction, library preparation and whole genome sequencing

Genomic DNA extractions and library preparations were carried out as previously described (20). Libraries were sequenced using either HiSeq 2000 or HiSeq X (Illumina) to generate 125 or 150 bp paired-end reads, respectively.

Reference genome alignment

Sequencing reads were aligned to the S. cerevisiae S288c (R64–1-1) reference genome using BWA mem (-t 16 -p -T 0) and duplicates were marked with bamstreamingmarkduplicates (biobambam2 2.0.50) and stored in CRAM format (primary data). From these, reads were extracted with samtools fastq and subsequently re-aligned to a modified reference genome in which repetitive DNA regions were hard-masked and moved, as single-copy sequences, to ad hoc artificial chromosomes. Duplicates were marked with bamsormadup SO = coordinate fixmate = 1.

Confirmation of strain genotypes

Samples were automatically checked for their expected polymerase genotype using the script deletion_check.pl. Briefly, for point mutations the DNA sequence from the triplet coding for the residue in question was extracted from the sequencing data, translated and compared with the expected. Deletions and genetic mating type were determined as previously described (20). Ploidy was determined a posteriori, based on the distribution of the observed allelic frequencies (AF). Strains displaying a median of alleles with AF ∼0.5 were classified as diploid, while strains with a median AF of ∼1 were classified as haploid. In the rare cases in which the number of de-novo alleles was too low, generating intermediate AF, the genetic mating type, identified bioinformatically, was used.

Variant calling, consequence annotation and filtering

SNVs and small insertions/deletions (INDELs) were identified chromosome by chromosome using samtools mpileup (v.1.9), with the following options: -g -t DP,DV -C0 -p -m3 -F0.2 -d10000, followed by bcftools call -vm -f GQ (v.1.9). All mutations from each chromosome were merged with bcftools concat. All variants were annotated with the Ensembl Variant Effect Predictor (VEP; v95.3). INDELs were subsequently normalized with bcftools norm -m-both –check-ref e and sorted with bcftools sort. Low quality variants were flagged with bcftools filter with the following options -m + -e ‘INFO/DP<10’ -e ‘FORMAT/DV<3’ -e ‘TYPE = \‘snp\’ & QUAL<100’ -e ‘TYPE = \‘indel\’ & QUAL<30’ -e ‘FORMAT/GQ<40’ -g 7. Variants present in control samples were subsequently removed with bcftools isec -w1 -C {sample_file} {control_file}.

Further mutation filtering

SNV mutations were further filtered on the QUAL value and their prevalence across different sequencing samples. Given the relatively low number of mutated positions compared to the genome size, the vast majority of mutations are expected to be unique in different MA lines in single-cell bottleneck experiments, and shared mutations are likely to originate from systematic sequencing errors. Taking this into account, we removed mutations whose quality was below an arbitrary threshold that grows linearly with the prevalence of the mutation in different samples (Supplementary Figure S1), thus excluding mutations that are frequently observed and of lower quality. In small-population bottleneck experiments, many mutations are shared between different colonies from the final population, because of their shared ancestry. For this reason, a similar, less stringent threshold was used. Filters were designed to remove ∼1–10% of all mutations. A similar rationale was used to filter INDELS. Small changes in the filtering parameters do not substantially alter the results of the subsequent analyses.

Analysis of mutation numbers

The total number of SNPs/INDELs for each sequencing sample was calculated by counting the number of mutations passing all filters. In single-cell bottleneck propagations two colonies per MA line were sequenced and only mutations observed in both colonies (and absent from passage 0) were retained for further analysis; where a second colony was not available because of sequencing failure, all mutations in the only available colony were retained. For small population bottleneck propagation all mutations present in each colony from the final population were considered. Mutation rates are given in terms of SNV(INDEL)/haploid genome/passage and converted to SNV/generation/bp assuming a haploid genome size of 12 071 326 bp and 20 generations from single cell to colony.

Analysis of mutation types

In small-population propagation experiments, one single mutagenic event is likely observed in more than one colony picked from the final-population. Thus, mutations derived from small-population experiments were initially grouped by MA line and, in each line, when the same mutation was observed in more than one colony, only one instance was retained. Mutations from single-cell and small-propagation experiments were then pooled for the purpose of the subsequent analyses. Analysis of the frequency of different SNV classes was carried out by grouping the mutations by genotype (irrespectively of the type of propagation or ploidy), counting the number relevant mutations, and summing complementary pairs (e.g. A>C + T>G).

Analysis of replication strand bias

The relative position of each mutation with respect to the nearest replication origin was calculated in two steps. First, a replication model was built using the coordinates of replication origins obtained from OriDB (http://cerevisiae.oridb.org; only using ‘confirmed’ origins); location of each origin was calculated as the midpoint of the ARS region; termination points were arbitrarily defined as the midpoint of each inter-origin span; leading-strand regions were defined as regions comprised between an origin and the termination point to its immediate right; lagging-strand regions were defined as regions comprised between an origin and the termination point to its immediate left. Second, each mutation was localized to an inter-origin span (thus, mutations located before the first origin or after the last origin of each chromosome were discarded); the distance between each mutation and the origin to its immediate left was calculated, and normalized for the size of the inter-origin span in which the mutation was located, so that a distance of 50% coincides to the midpoint termination zone, and a distance of 100% coincides to the subsequent origin. To avoid over-weighting shared mutations originating from small-population bottleneck experiments, only a distinct set of mutations was considered. The density of each mutation type was then plotted as a function of the relative distance of mutations from the origin to the immediate left.

Analysis of mutation patterns and comparison with mutation signatures

Mutational patterns were obtained by calculating the frequency of the 96 trinucleotide contexts (channels) in which mutations belonging to one of the six main classes (C>A, C>G, C>T, T>A, T>C, T>G) occurred. The remaining mutations (G>T, G>C, G>A, A>T, A>G, A>C) were reverse complemented along with their context and assigned to the appropriate channel. The frequency of each channel was then normalized by the relative abundance of each trinucleotide in the yeast genome.

Comparison with mutational signatures identified in cancers was calculated using cosine similarity (21) and the cos_sim_matrix function of the MutationalPatterns R package (22). Cancer signatures were obtained from https://cancer.sanger.ac.uk/cancergenome/assets/signatures_probabilities.txt.

Analysis of mutation sequence context

After extracting the context (five nucleotides) in which each mutation occur, mutations were classified as right or left mutations depending on where they occurred in relation to origins of replication and presumed termination points (see Analysis of replication strand bias). To exclude as much as possible mutations introduced by Pol ϵ synthesizing DNA beyond the midpoint of each replicon, only mutations occurring in the first and last third of each replicon were considered. Sequence logo images from the context of each mutation class were obtained with the ggseqlogo R package (23).

RESULTS

A spectrum of DNA Pol ϵ ultra-mutator alleles

As an approach to investigate how ultra-mutator Pol ϵ mutants exert their genotoxic activities in vivo, we focused on POLE alleles (Figure 1A) originally described as drivers of colorectal and endometrial cancers (4–6). To avoid confounding factors that would arise from conducting such studies directly in cancer cell lines, such as a higher background of genomic instability, we introduced, where possible, the corresponding mutations in heterozygous state into the diploid yeast strain W303. Thus, we generated eight yeast strains, each carrying one of eight pol2 alleles encoding the different aminoacid changes in evolutionary conserved residues (Figure 1A). Several colonies (18–54) for each POL2/pol2- heterozygous diploid strain were then independently cultured through single-cell bottlenecks (26 passages; ∼500 cell generations), allowing mutations to accumulate in the genome. Mutational events that occurred during the experiment were identified by whole-genome sequencing of each mutation accumulation (MA) line at the start and the end of the experiment. For comparison, we also carried out such analyses of MA lines containing wild-type POL2 or a proofreading defective pol2–4 allele (15,24). Since at each passage colonies were randomly selected, we expect mutation rates and spectra to be unbiased with regard to their consequences on gene function, except for overlooking lethal mutations.

Analysis of ultra-mutator alleles of DNA polymerase ϵ modelled in S. cerevisiae. (A) Outline of DNA polymerase ϵ ultra-mutator alleles identified in cancers and corresponding mutations in S. cerevisiae POL2; the numbers indicate the residue position and domains are indicated with coloured boxes (DUF: domain with unknown function; Palles: mutation first reported in Palles C., et. al, Nat. Gen. 2013; Church: mutation first reported in Church DN, et al., Hum. Mol. Genet. 2013; TCGA: mutation present in The Cancer Genome Atlas). Mutations in grey occur in residues not evolutionary conserved in S. cerevisiae. The most studied mutation (P286R/P301R) is indicated in bold. (B) Rates of mutation accumulation in diploid yeast strains carrying indicated heterozygous POL2 alleles propagated through single-cell passage bottlenecks for ∼500 generations. Each independent evolution line is indicated by a dot, while the median is indicated in red. The number of independent lines (n) is indicated below. Statistical test: Mann–Whitney–Wilcoxon (C) Comparison of mutation accumulation rates (x-axis) and loss-of-function mutation rates at LYP1, yielding thialysine resistance (y-axis). Blue line indicates the linear regression model; shaded area the 95% confidence interval; R is Pearson's correlation coefficient. (D) Rates of mutation accumulation in strains carrying the indicated heterozygous POL2 alleles propagated through small-population passage bottlenecks for 350–450 generations. Different colonies derived from the final population are indicated by dots, while the median is indicated in red. The number of colonies studied (n) is indicated below. Statistical test: Mann–Whitney–Wilcoxon. (E) Frequency of mutations generated in the presence of the indicated pol2 alleles, by type of mutation and trinucleotide context in which the mutations occur; frequencies have been corrected for the trinucleotide frequency in the yeast genome; see Figure 2C for x-axis labels. (F) Geographical distribution of the density of mutations by type in relation to replication origins. The origins to the left and to the right of each mutation are indicated by L and R respectively, and the distance is expressed as a percentage of the inter-origin distance of each origin pair (L→50%: leading strand; 50%→R: lagging strand). Asterisks indicate mutation types significantly deviating from uniformity (P< 0.01 χ2 test).
Figure 1.

Analysis of ultra-mutator alleles of DNA polymerase ϵ modelled in S. cerevisiae. (A) Outline of DNA polymerase ϵ ultra-mutator alleles identified in cancers and corresponding mutations in S. cerevisiae POL2; the numbers indicate the residue position and domains are indicated with coloured boxes (DUF: domain with unknown function; Palles: mutation first reported in Palles C., et. al, Nat. Gen. 2013; Church: mutation first reported in Church DN, et al., Hum. Mol. Genet. 2013; TCGA: mutation present in The Cancer Genome Atlas). Mutations in grey occur in residues not evolutionary conserved in S. cerevisiae. The most studied mutation (P286R/P301R) is indicated in bold. (B) Rates of mutation accumulation in diploid yeast strains carrying indicated heterozygous POL2 alleles propagated through single-cell passage bottlenecks for ∼500 generations. Each independent evolution line is indicated by a dot, while the median is indicated in red. The number of independent lines (n) is indicated below. Statistical test: Mann–Whitney–Wilcoxon (C) Comparison of mutation accumulation rates (x-axis) and loss-of-function mutation rates at LYP1, yielding thialysine resistance (y-axis). Blue line indicates the linear regression model; shaded area the 95% confidence interval; R is Pearson's correlation coefficient. (D) Rates of mutation accumulation in strains carrying the indicated heterozygous POL2 alleles propagated through small-population passage bottlenecks for 350–450 generations. Different colonies derived from the final population are indicated by dots, while the median is indicated in red. The number of colonies studied (n) is indicated below. Statistical test: Mann–Whitney–Wilcoxon. (E) Frequency of mutations generated in the presence of the indicated pol2 alleles, by type of mutation and trinucleotide context in which the mutations occur; frequencies have been corrected for the trinucleotide frequency in the yeast genome; see Figure 2C for x-axis labels. (F) Geographical distribution of the density of mutations by type in relation to replication origins. The origins to the left and to the right of each mutation are indicated by L and R respectively, and the distance is expressed as a percentage of the inter-origin distance of each origin pair (L→50%: leading strand; 50%→R: lagging strand). Asterisks indicate mutation types significantly deviating from uniformity (P< 0.01 χ2 test).

In line with previous estimates (1.67–3.8 × 10–10 SNV/generation/bp) (25–27), wild-type MA lines acquired a median of 0.1 SNV/haploid genome/passage (or ∼4.14 × 10–10 SNV/generation/bp), while strains carrying a proofreading-defective allele (pol2–4/POL2) displayed a modest increase over this rate (∼30% or 7.04 × 10−10 SNV/generation/bp; Figure 1B). Four pol2 alleles led to either no detectable mutator phenotype (Q468R, D290V) or a hypermutator phenotype similar in magnitude to that of pol2–4/POL2 cells (V426L, A480V; Figure 1B). In contrast, four other Pol ϵ alleles (P301R, M459K, S312F, L439V) accumulated considerably more mutations than would be expected by simple lack of exonuclease activity (3–21 times or 2.1–14.7 × 10–9 SNV/generation/bp), thus reflecting an ultra-mutator phenotype. Since the growth rates of these strains did not substantially differ from those of the other POL2 mutants or from the wild-type strain (Supplementary Figure S2A), MA could be taken as an accurate reflection of mutation rates. Accordingly, when we characterised haploid strains carrying wild-type or various mutant POL2 alleles for their rates of loss-of-function mutations yielding thialysine resistance (mainly occurring via inactivation of LYP1 (28)), there was good concordance (R = 0.81) between these data and results from MA experiments (Figure 1C).

Notably, the presence of a wild-type polymerase reduced the mutagenic effects of the hypermutator allele as MA in heterozygotic diploid cells was substantially reduced compared to haploid cells (62–92%; Supplementary Figure S2B). This reduction exceeded the expected 50%, pointing to other factors impacting on Pol ϵ-driven mutagenesis such as the reduced ratio of mutagenic Pol ϵ to Pol δ, which could contribute to increase the frequency of in trans proofreading by Pol δ (29,30) in heterozygous diploid cells. Alternatively, in diploid cells, the wild-type polymerase may be preferentially expressed or used.

The haploid state also unmasked an ultra-mutator phenotype for pol2-A480V, which in heterozygosis did not accumulate significantly more mutations than the corresponding exonuclease deficient strain (compare blue and red P-values, Supplementary Figure S2B). In contrast to SNVs, the strongest pol2-ultra alleles only led to a small but statistically significant increase in the accumulation of INDELs in haploid cells (Supplementary Figure S2C). These findings are in line with previous studies of pol2-P301R, pol2-L439V, pol2-V426L and pol2-D290V cells using classical mutation rate assays (31), and together with our results show that ultra-mutagenesis—the accumulation of considerably more mutations than would be expected by loss of exonuclease activity—is a common outcome for mutations in the proofreading domain of Pol ϵ that are found in cancers.

Diploidization and accumulation of secondary Pol ϵ mutations in a competitive growth assay

Assessing MA in cells propagated through population bottlenecks of ∼3 × 104 cells broadly confirmed our initial observations, despite a larger number of mutations per passage, and much higher variability between different colonies (Figure 1D). These effects likely arose from the experimental settings: population expansion through ∼104 rather than single cell bottlenecks presumably allowed different clones to grow at different rates, and random sampling at each passage would favour the propagation and analysis of fitter, faster-growing clones, which would have completed more DNA replications and therefore accumulated more neutral or adaptive mutations than slower growing ones. The appearance and selection of anti-mutators suppressor mutations (32,33) could also explain the increased variability in mutation numbers. Indeed, when we analysed the prevalence of mutations in POL2 we found evidence of additional mutations in this gene often co-occurring with the stronger mutators (P301R, M459K, S312F, L439V) but only rarely with the weaker mutators or in wild-type cells (Supplementary Figure S3A and B). In line with recent findings (33) we also identified one spontaneous diploidization event in four isolates of the pol2-M459K haploid strain, which co-occurred with several additional pol2 mutations (Supplementary Figure S3A, B; Figure S5C, blue circle).

Exonuclease deficient and pol2-ultra alleles have similar mutational spectra

Analysis of the mutational spectra generated in the absence of Pol ϵ exonuclease activity (pol2–4) revealed a relative increase in the frequency of T>A/A>T transversions, which is further expanded in pol2-ultra cells, while C>G transversions—the rarest class in wild-type strains—becomes relatively rarer (Figure 1E). Analysis of the trinucleotide context in which mutations occur also revealed a very strong resemblance between the mutational patterns generated by the different pol2-ultra mutants and by pol2–4 cells (Figure 1E and Supplementary Figure S4A and B; cosine similarity 0.93–0.98). POLE–mutated tumours display very specific mutation signatures (SBS 10 and 28) in humans and mice (34,35), and these signatures have been recapitulated in human cells engineered to express mutant POLE (36). When we compared the mutational patterns generated by pol2-ultra mutants with cancer signatures we however found only moderate or weak similarities to SBS 14 (Supplementary Figure S4C, cosine similarity 0.66–0.75) or SBS 8 (0.64–0.68).

Most variants generated in the presence of Pol ϵ mutants are likely introduced on the replication leading strand, where the activity of this DNA polymerase is confined (37,38). To measure the replication-strand bias of the observed mutations, we calculated the relative distance of each mutation from the replication origins located to its immediate left and right, and then calculated the mutational density of each complementary mutation pair (e.g. G>A and C>T) as a function of the distance (Figure 1F). This analysis revealed a strong asymmetry (Cohen's w = 0.43–0.51) in the distribution of A>C, A>T and G>T in pol2-P301R cells, and a weaker asymmetry (w = 0.17–0.29) for A>G and C>T mutations. In particular A>C, A>G, A>T, G>A and G>T were observed more frequently on the leading strand than their complementary counterparts (Figure 1F and Supplementary Figure S4D; the low mutation counts for C>G/G>C transversions did not permit establishment of whether a bias is present in this channel). Strikingly, this pattern of mutagenesis was observed in cells carrying both stronger and weaker ultra-mutator alleles, and in pol2–4 cells as well, with the degree of asymmetry increasing with the total number of mutations available for analysis (Supplementary Figure S4E). Taken together, these results strongly suggest a common mechanistic origin for mutations observed in cells lacking Pol ϵ exonuclease activity and in cells carrying ultra-mutator Pol ϵ variants. They also indicate that SNV accumulation in pol2–4 cells does not originate from the mere lack of exonucleolytic activity.

Synergism of Pol ϵ exonuclease domain mutants with MMR deficiency

Mismatch-repair (MMR) recognises different DNA duplex mis-pairs with different efficiencies (39), thereby distorting the frequencies of different mutation classes from the frequencies generated by DNA polymerases. As an approach to determine the mutational patterns as they are generated by Pol ϵ ultra-mutators (those Pol ϵ alleles driving mutagenesis at a higher rate than pol2–4), we attempted to delete MSH2, which encodes a mismatch binding ATPase that is required for all branches of MMR. As previously observed for several other DNA polymerase mutator alleles (16,24,32), sporulation of most pol2-ultra msh2Δ heterozygous diploids did not yield viable double mutant strains for the strongest ultra-mutators (pol2-P301R, pol2-M459K and pol2-S312F). Microscopic observation of these spores revealed that they germinated to vegetative cells but ceased to divide after a few generations, a phenotype consistent with extreme mutational burden leading to ‘error-induced extinction’ (32,33). By contrast, double mutants for the hypermutator pol2–4 in combination with msh2Δ were obtained from corresponding heterozygous diploids but displayed reduced colony size compared to MMR-proficient controls. We also managed to obtain a viable MMR-deficient version of pol2L439V—a relatively weak ultra-mutator. MA analyses of these strains confirmed that, similarly to heterozygous diploids, pol2–4 and pol2-L439V haploid cells accumulated SNVs at a faster rate compared to wild-type strains in the presence of functional MMR (∼4× and ∼12× respectively; Figure 2A and Supplementary Figure S2B), while disruption of MSH2 alone resulted in a ∼16-fold increase (or ∼7.4 × 10–9 SNV/bp/generation, similar to a previous estimate of 4.8 × 10–9 SNV/bp/generation (40)). In contrast, a dramatic increase in SNV accumulation was evident when mismatch repair inactivation was combined with pol2–4 or pol2-L439V (∼421× and ∼874×, respectively; Figure 2A). These numbers are well above the expected mutation rate increases in the double mutants under an additive model (∼21X and ∼28X, respectively) or a multiplicative model (41) (∼69X and ∼196X, respectively), demonstrating a synergistic interaction. Additionally, pol2-L439V accumulated more mutations than pol2–4 even in the absence of MMR, indicating that the ultra-mutator phenotype does not arise from a differential mismatch repair efficiency. Analysis of allele frequencies and status of the mating type (MAT/HM) loci also revealed that both msh2Δpol2–4 and msh2Δpol2-L439V strains were actually diploid. Both strains were homozygous for MSH2 deletion and pol2 mutations, but the former was a MATa/MATa diploid—possibly originating from a whole-genome duplication event caused by failed cytokinesis (33)—and the latter a MATa/MATalpha diploid—possibly originating from homothallic mating after a rare mating-type switch event (Supplementary Figure S5A and B). These results are in line with recent findings in yeast pol2-P301R cells (33) and in POLE P286R-driven mouse cancer models (42), and suggest that a transition to the diploid or polyploid state facilitates survival in the face of extreme mutagenesis as expected from the fact that deleterious mutations are frequently recessive and often masked in a heterozygotic diploid state. Similar to what was previously observed for cells carrying the strongest Pol ϵ hypermutators, we also found that both msh2Δ/msh2Δ pol2–4/pol2–4 and msh2Δ/msh2 pol2-L439V/pol2-L439V accumulated additional mutations in POL2 (Supplementary Figure S3C and D).

Mutation signatures generated by Pol ϵ exonuclease domain alleles in the absence of mismatch repair. (A) Number of SNVs generated during single-cell bottleneck propagation by Pol2 exo(pol2–4) or the pol2-ultra mutant pol2-L439V in the presence or in the absence of functional mismatch repair. The median is indicated in red. (B) Similarity between the SNV patterns generated by pol2–4 and pol2-L439V in the absence of functional mismatch repair and the 30 mutation signatures identified in cancers. Colours indicate cosine similarity (Alexandrov, 2013). (C) Frequency of mutations generated by Pol2 exo- (pol2–4), the pol2-ultra mutant pol2-L439V, or pol2-M644G in the absence of mismatch repair, by type of mutation and trinucleotide context in which the mutations occur; frequencies have been corrected for the trinucleotide frequency in the yeast genome; the the most similar COSMIC signature (SBS14) is also represented. (D) Geographical distribution of the density of mutations by type in relation to replication origins. The origins to the left and to the right of each mutation are indicated by L and R, and the distance is expressed as a percentage of the inter-origin distance of each origin pair. The data for pol2-L439V have been carried over from Figure 1 for comparison purposes. The intensity of each mutation channel is proportional to the intensity of the colour. Asterisks indicate mutation types significantly deviating from uniformity (P< 0.01, χ2 test). (E) Geographical distribution of the density of mutations by type in relation to replication origins generated by the pol2-M644G allele in the presence or in the absence of functional MMR.
Figure 2.

Mutation signatures generated by Pol ϵ exonuclease domain alleles in the absence of mismatch repair. (A) Number of SNVs generated during single-cell bottleneck propagation by Pol2 exo(pol2–4) or the pol2-ultra mutant pol2-L439V in the presence or in the absence of functional mismatch repair. The median is indicated in red. (B) Similarity between the SNV patterns generated by pol2–4 and pol2-L439V in the absence of functional mismatch repair and the 30 mutation signatures identified in cancers. Colours indicate cosine similarity (Alexandrov, 2013). (C) Frequency of mutations generated by Pol2 exo- (pol2–4), the pol2-ultra mutant pol2-L439V, or pol2-M644G in the absence of mismatch repair, by type of mutation and trinucleotide context in which the mutations occur; frequencies have been corrected for the trinucleotide frequency in the yeast genome; the the most similar COSMIC signature (SBS14) is also represented. (D) Geographical distribution of the density of mutations by type in relation to replication origins. The origins to the left and to the right of each mutation are indicated by L and R, and the distance is expressed as a percentage of the inter-origin distance of each origin pair. The data for pol2-L439V have been carried over from Figure 1 for comparison purposes. The intensity of each mutation channel is proportional to the intensity of the colour. Asterisks indicate mutation types significantly deviating from uniformity (P< 0.01, χ2 test). (E) Geographical distribution of the density of mutations by type in relation to replication origins generated by the pol2-M644G allele in the presence or in the absence of functional MMR.

Pol ϵ–L439V and Pol ϵ exo yield replication-strand biased Signature 14

Analysis of the trinucleotide context in which mutations are introduced by Pol ϵ–L439V and Pol ϵ exo, before MMR correction, revealed that these are nearly identical (cosine similarity = 0.992). Comparison of these profiles with the profiles of 30 mutational signatures originally identified in cancers (43), detected a very high similarity to COSMIC Signature SBS 14 (Figure 2B and C, cosine similarity 0.93–0.94) and a moderate similarity to SBS20 (0.73–0.76). In agreement with this finding, SBS 14 was originally identified in uterine cancers and low grade gliomas (2), and is also observed in cancers carrying both POLE mutations and microsatellite instability—the latter a feature of MMR inactivation (44). These findings, together with similar results obtained for pol2-P301R/POL2 msh2Δ/msh2Δ diploid cells (33), further support the notion that SBS 14 is the product of mutagenesis driven by Pol ϵ exonuclease domain mutants, before MMR correction. Interestingly, SBS 20 has been proposed to arise from mutations in DNA polymerase δ in conjunction with MMR inactivation (44).

Replication-strand bias analysis revealed a strong preference for A>C, A>G and G>T mutations on the leading strand, as it was observed in the presence of functional MMR (Figure 2D and Supplementary Figure S4D and F). Conversely, the strong preference for A>T and the weak preference for G>A mutations on the leading strand was reduced or disappeared when MMR was inactivated (Figure 2D and Supplementary Figure S4D and F), suggesting that while Pol2 mutants equally introduced A>T and T>A mutations, MMR corrected one (T>A) more efficiently than the other (A>T). This pattern of mutagenesis is reminiscent of a yeast Pol ϵ mutant, pol2-M644G (45), which carries a mutation in the polymerase domain of Pol ϵ that creates a ‘looser’ active site that allows mis-incorporation of dNTPs and rNTPs (37,46). To identify similarities and differences between the mutational patterns generated by Pol ϵ exonuclease and polymerase-domain mutator alleles, we re-analysed previously published mutation accumulation experiments for pol2-M644G in the presence or absence of MSH2 (45), and compared these to our results.

In the absence of confounding MMR effects, we found that the activity of each mutation channel was substantially different between exonuclease and polymerase domain mutants, with a lower frequency of C>A mutations and a higher frequency of C>T mutations (Figure 2C and E). However, the overall replication-strand bias was strikingly similar, with only one major difference in the A>T/T>A channel: while polymerase domain alleles, before (or after) mismatch correction, were more likely to produce A>T mutations by mispairing T:dT more frequently than A:dA (37,45), exonuclease domain alleles produced both types of mis-pair essentially equally (Figure 2E, Supplementary Figure S4G). The striking similarity between mutations introduced in the genome by Pol ϵ exonuclease and polymerase domain mutants suggest that, with some minor differences, a similar mutagenic process is active in cells carrying either mutant.

A unique signature generated by Pol ϵ exonuclease domain alleles

We next compared the context in which every class of mutation was observed on the two replication strands. To do this, we pooled all mutations from pol2–4 msh2Δ and pol2-L439V msh2Δ strains, given their near identity (cosine similarity = 0.99, Supplementary Figure S6A and B) and apparent common aetiology, and compared them with mutations generated by the polymerase domain allele pol2-M644G in the absence of MSH2 (45). This indicated that alteration of either the exonuclease or polymerase domain leads to the mis-insertion of dCTP opposite to the second T of a –TT– dimer template; less frequently the inverse is also observed (dTTP mis-insertion opposite to the C of a –TC– dimer template, Figure 3A, B red boxes, and Supplementary Figure S6C). Overall, these two classes of mutations were more prevalent in exonuclease than in polymerase-domain mutator strains (∼34% versus ∼10% of all mutations, respectively; P < 0.01, χ2z-test of given proportions) suggesting that a similar mutagenic mechanism occurs with different intensity in different mutants.

Sequence context of the de novo mutations generated in cells with Pol ϵ exonuclease or polymerase domain mutant alleles before MMR correction. (A-F) Sequence context in which each de novo mutation class occurs in cells carrying exonuclease domain mutator alleles (pol2–4 or pol2-L439V) or polymerase domain mutator alleles (pol2-M644G), in the absence of functional MMR (msh2Δ). Mutations, as indicated by the nucleotide change observed on the Watson strand, were categorised by class (A–F) with the frequency of each class indicated in brackets. Within each class mutations were further categorised as the two possible types generating that class and whether each mutation type occurred to the right or to the left of the closest replication origin. The position of the mutation is indicated by a blue column. Transparent diagrams indicate less frequent types/positions in each class, which could arise from artefacts created by leading strand replication not precisely terminating at the inter-origin midpoint or by some origins being passively replicated. In the middle, a diagram describing the mis-pairs at the origin of the indicated mutations and contexts is shown.
Figure 3.

Sequence context of the de novo mutations generated in cells with Pol ϵ exonuclease or polymerase domain mutant alleles before MMR correction. (A-F) Sequence context in which each de novo mutation class occurs in cells carrying exonuclease domain mutator alleles (pol2–4 or pol2-L439V) or polymerase domain mutator alleles (pol2-M644G), in the absence of functional MMR (msh2Δ). Mutations, as indicated by the nucleotide change observed on the Watson strand, were categorised by class (AF) with the frequency of each class indicated in brackets. Within each class mutations were further categorised as the two possible types generating that class and whether each mutation type occurred to the right or to the left of the closest replication origin. The position of the mutation is indicated by a blue column. Transparent diagrams indicate less frequent types/positions in each class, which could arise from artefacts created by leading strand replication not precisely terminating at the inter-origin midpoint or by some origins being passively replicated. In the middle, a diagram describing the mis-pairs at the origin of the indicated mutations and contexts is shown.

A second shared pattern of mutagenesis between polymerase and exonuclease domain mutants of Pol ϵ is the mis-insertion of dTTP in front of G or T, often occurring after a pyrimidine in the template (Figure 3C and D, blue boxes). Notably, these classes of mutations were more frequently observed in MA lines with a polymerase-domain mutator allele (∼33% of all mutations) than in strains with exonuclease-domain mutator alleles (∼19% of all mutations; P < 0.01). Other classes of mutation showed very little or no sequence-context specificity, despite their overall relatively high prevalence (Figure 3C and E; unboxed patterns). A notable exception to this was insertions of A in front of the first A after a long T homopolymer that is followed by an AA dimer (TTTTTAA for example; Figure 3D, green boxes). Despite being a relatively uncommon event (∼2% of all mutations), this signature was unique to exonuclease-domain mutators, being completely absent from the mutational spectra of pol2-M644G msh2Δ cells, and accounted for the observed lack of bias in the A>T/T>A channel that is characteristic of Pol ϵ exonuclease-domain mutators (Figure 2D).

Pol ϵ proofreading domain mutations increase the frequency of insertions

Analysis of the number of insertions/deletions accumulated in the absence of MMR, which would otherwise efficiently repair them, revealed that Pol ϵ exo introduces insertions ∼4.5 times more frequently than wild-type Pol ϵ does, and that this is further increased two-fold in the presence of Pol ϵ–L439V (Figure 4A). Analysis of the type of base inserted revealed that in both cases, mutant Pol ϵ is mainly responsible for the introduction of +T and +A, which overall represent 70–80% of all insertions (Figure 4B, as opposed to ∼31% for msh2Δ strains). Since these insertions likely arise from the role of Pol ϵ as the leading strand replicase (38), replication-strand bias analysis suggests that Pol ϵ–L439V is mostly responsible for +A insertions (Figure 4C). Moreover, these +A insertions on the leading strand tend to occur in the context of a 3–6 nucleotide T homopolymer in the template strand (Figure 4D), suggesting that they originate from polymerase slippage.

Pol ϵ ultra-mutators introduce +A insertions opposite T homopolymers. (A) Rate of insertions and deletions introduced by Polϵ exo– (pol2–4) and the ultra-mutator Polϵ L439V (pol2-L439V) in the presence or absence of functional mismatch repair (msh2Δ). (B) Rate of insertion of different mono- or di-nucleotides. (C) Geographical distribution of the density of the two most common types of insertions in relation to replication origins. (D) Sequence context in which +A/+T insertions occur in relation to wether they occurred to the right or to the left of the closest replication origin; the position of the mutation is indicated by a blue column; the numbers in brackets indicate the prevalence of this class compared to all insertions; the conserved homo-polymer context always appears 3′ of the mutation because INDELs were left-aligned during variant normalization. A box indicates the insertions most frequently observed on the leading strand. (E) Rate of deletion of different mono- or di-nucleotides. (F) Geographical distribution of the density of the two most common types of deletion in relation to replication origins. (G) Sequence context in which –A/–T deletions occur in relation to wether they occurred to the right or to the left of the closest replication origin; the position of the mutation is indicated by a blue column; the numbers in brackets indicate the prevalence of this class compared to all deletions. A box indicates deletions most frequently observed on the lagging strand.
Figure 4.

Pol ϵ ultra-mutators introduce +A insertions opposite T homopolymers. (A) Rate of insertions and deletions introduced by Polϵ exo (pol2–4) and the ultra-mutator Polϵ L439V (pol2-L439V) in the presence or absence of functional mismatch repair (msh2Δ). (B) Rate of insertion of different mono- or di-nucleotides. (C) Geographical distribution of the density of the two most common types of insertions in relation to replication origins. (D) Sequence context in which +A/+T insertions occur in relation to wether they occurred to the right or to the left of the closest replication origin; the position of the mutation is indicated by a blue column; the numbers in brackets indicate the prevalence of this class compared to all insertions; the conserved homo-polymer context always appears 3′ of the mutation because INDELs were left-aligned during variant normalization. A box indicates the insertions most frequently observed on the leading strand. (E) Rate of deletion of different mono- or di-nucleotides. (F) Geographical distribution of the density of the two most common types of deletion in relation to replication origins. (G) Sequence context in which –A/–T deletions occur in relation to wether they occurred to the right or to the left of the closest replication origin; the position of the mutation is indicated by a blue column; the numbers in brackets indicate the prevalence of this class compared to all deletions. A box indicates deletions most frequently observed on the lagging strand.

Pol ϵ proofreading activity generates deletions

Differently from what we have observed for insertions, MA analysis revealed that, before MMR correction, the production of short, mostly single-base, deletions was ∼50% lower in pol2–4 cells compared to wild-type POL2. This suggests that Pol ϵ exonuclease activity is directly responsible for essentially half of the deletions produced during normal DNA replication in the absence of MMR which would otherwise repair them. The Pol ϵ–L439V mutant showed an intermediate deletion rate phenotype, possibly because this mutant could retain partial exonuclease activity, as it was shown for Pol ϵ–P301R. Analysis of the spectrum of deletions implied that a wild-type replisome largely introduces –T, –A, –TT and –AA deletions, and that Pol ϵ exonuclease activity contributes to roughly half of these (Figure 4E). Analysis of replication-strand bias also revealed that the frequent –A and –T deletions do not normally show any discernible strand preference. Inactivation of Pol ϵ exonuclease activity, however, led to a strong bias for –A deletions on the leading strand and corresponding –T deletions on the lagging strand (Figure 4F). These results suggest that both Pol ϵ exonuclease activity and an unidentified process on the lagging strand (possibly Pol δ exonuclease activity) produce frequent –T deletions, giving rise to no overall replication-strand bias. In pol2–4 cells, however, it appears that the leading strand branch of this mutagenic pathway is inactivated, generating a –A deletion bias that is the reflection of the –T deletions produced on the lagging strand (Figure 4G).

DISCUSSION

DNA-replication associated hypermutation is a key risk factor for the accumulation of mutations in genes driving colorectal and endometrial cancers, whether arising from mismatch repair (MMR) inactivation or from DNA polymerase ϵ or δ exonuclease domain mutator (EDM) alleles. While it is clear how MMR inactivation increases mutagenesis, establishing the source of mutations generated by EDM alleles has proven more difficult. The yeast benchmark for EDM alleles, pol2-P301R, generates mutations at a much higher rate than the corresponding exonuclease-dead strain (an ultra-mutator phenotype) (14), while retaining part of the wild-type exonuclease activity (16). Our results now show that the ultra-mutator phenotype is shared by other yeast Pol ϵ EDM alleles orthologous to cancer-associated Pol ϵ mutations, most notably pol2-M459K (human POLE M444K), and that increased mutation accumulation in pol2-ultra cells compared to pol2–4 cells also occurs in the absence of MMR activity, strongly suggesting that differential repair of mismatches is not a major source of hypermutation.

Studies of Pol ϵ protein structure have revealed that the P301R mutation creates a positively charged surface that likely hinders a 3′ mismatched primer end from properly accessing the proofreading domain and could, thus, favour its extension (16,17). Strikingly, pol2-M459K—the second strongest mutator in our dataset—also introduces a positive charge in the same area (Supplementary Figure S7), while the weaker ultra-mutator alleles could also hinder proper DNA strand placement, perhaps by altering Pol ϵ structure more subtly.

The puzzling ultra-mutator phenotype could be explained if most of the mis-pairs observed in cells lacking Pol ϵ exonuclease activity are not just caused by the inability of this mutant polymerase to remove misincorporated bases, but rather arise from an active mutagenic process whose intensity is heightened by Pol ϵ EDM alleles. A prediction of this model is that Pol ϵ ultra and Pol ϵ exo would produce the same pattern of mutations; and, indeed, the mutational profiles, strand bias, and sequence contexts in which mispairs occur in pol2-ultra and pol2–4 cells are virtually indistinguishable from each other, especially before MMR correction. In this scenario, Pol ϵ could contain a ‘mutagenic activity’, normally suppressed by the exonuclease activity, and activated by EDM alleles. The aminoacid changes encoded by these alleles could, for instance, unmask a translesion synthesis activity of Pol ϵ, as it has been shown for the increased ability of exonuclease-deficient Pol δ to bypass AP sites (47,48).

We suggest, however, that at least part of the mutations introduced in pol2-ultra cells arise from the backtracking of the mutant Pol ϵ while attempting proofreading. In a wild-type polymerase, this movement melts the primer-template junction until the nascent 3′ end has been inserted in the exonuclease domain for hydrolysis (Figure 5A). In the absence of hydrolysis—or even more so when access to the exonuclease active site has been blocked by pol2-ultra mutations—the nascent strand would prevent this movement. In these situations, backtracking could still occur if the nascent strand were to shift backwards and extrude a base further upstream (Figure 5B), an activity that would result in the generation of insertions, especially after A:T homopolymers that are easier to melt than G:C ones because of their weaker bonding. At this point, further extension would require the mis-inserted base to form a proper pair with the T template, and thus would occur only when an adenine was mis-inserted in the first place (Figure 5C). In agreement with this model, we found that, in the absence of MMR, pol2–4 cells and to a greater extent pol2-L439V cells, frequently accumulated +A insertions when replicating through T-homopolymers. Furthermore, when two adenines follow the T-homopolymer template, the first one becomes a hotspot for A:A mis-pairs. This would arise from the newly synthesised strand sliding forward after the first base post-mismatch has been introduced, in a model analogous to dislocation mutagenesis (49). This would restore full base pairing, converting the +A insertion into a A:A mispair and generating the observed T>A transversions (Figure 5D). Given the sequence and mis-insertion requirements needed, the above-described events should be comparatively rare; and indeed, insertions represented only ∼9.3% of all the mutations we observed in MMR-deficient cells carrying EDM alleles, while T>A transversions accounted for less than 2%.

Model for the origin of mutations specifically introduced by Pol ϵ exonuclease domain mutants. (A) In the presence of a terminal mismatch (yellow segment), Pol ϵ backtracks (black arrow) and repositions the terminal-mismatched primer in the proofreading domain; after hydrolysis the primer is re-annealed and extended. (B) In the presence of exonuclease-inactive mutants or ultra-mutator mutants backtracking is blocked unless the newly synthesised strand melts, shifts back and extrudes one base (yellow arrow). (C) Melting and extrusion can only occur on homopolymer templates (TTTTT), where the shift does not alter correct base-base pairing between template and nascent strand. If the mis-inserted base is an adenine (yellow A), this can pair with the last T of the template and reconstitute a proper base pair that can be extended, generating insertions. (D) If a T homopolymer is followed by two A’s, then a T would be inserted in front of the first A (red T), but would then move in front of the second, when the base extruded in the homopolymer is re-annealed.
Figure 5.

Model for the origin of mutations specifically introduced by Pol ϵ exonuclease domain mutants. (A) In the presence of a terminal mismatch (yellow segment), Pol ϵ backtracks (black arrow) and repositions the terminal-mismatched primer in the proofreading domain; after hydrolysis the primer is re-annealed and extended. (B) In the presence of exonuclease-inactive mutants or ultra-mutator mutants backtracking is blocked unless the newly synthesised strand melts, shifts back and extrudes one base (yellow arrow). (C) Melting and extrusion can only occur on homopolymer templates (TTTTT), where the shift does not alter correct base-base pairing between template and nascent strand. If the mis-inserted base is an adenine (yellow A), this can pair with the last T of the template and reconstitute a proper base pair that can be extended, generating insertions. (D) If a T homopolymer is followed by two A’s, then a T would be inserted in front of the first A (red T), but would then move in front of the second, when the base extruded in the homopolymer is re-annealed.

With the exception of insertions and T>A transversions the sequence context and strand bias of the remaining mutations, which account for the vast majority (∼89%), closely resembled the context of mutations deposited by a low-fidelity Pol ϵ mutant carrying a mutation in the polymerase domain (pol2-M644G). This strong similarity suggests that the mutations generated by these two Pol ϵ variants arise from the same set of mutagenic mechanisms. At the same time, differences in the activity of the single mutation channels between exonuclease and polymerase domain mutants, could point to a different balance of mutagenic processes. In this regard, it has been shown that Pol δ can proofread the errors introduced by Pol ϵ (29), and does proofread the errors introduced by pol2-P301R (50) and pol2-M644G (30): a different in trans proofreading efficiency in different Pol ϵ mutants could possibly contribute to the observed differences. Variation in the intensity of different mutation channels could equally arise from the increased dNTP levels, which have been shown to contribute to the mutator phenotype of pol2-M644G cells (51), but not that of pol2-P301R cells (16). While these data suggest that the majority of mutations introduced in ultra-mutator cells are directly introduced by Pol ϵ, we cannot however exclude that another polymerase is responsible for introducing mis-pairs on the leading strand after Pol ϵ stalling, for example through the activity of translesion polymerases or Pol δ on the leading strand (52).

In conclusion, our findings have provided further insights into how cells normally guard against mutagenesis during DNA replication, and how specific point mutations in replicative polymerases affect their function to heighten mutation rates and lead to distinctive mutational signatures. Given that DNA replication and replicative polymerases are highly conserved throughout evolution, the effects and mechanisms that we have described likely also operate in cancers with orthologous polymerase alleles.

DATA AVAILABILITY

Primary sequencing data has been deposited at the European Nucleotide archive (https://www.ebi.ac.uk/ena/browser/home) with the accession IDs indicated in Supplementary Table S2 under the study ERP127302. The software used for primary data analysis (from fastq to vcf files) is available through GitHub at https://github.com/fabiopuddu/augur-fermentorum (v0.98). The R scripts for subsequent data analysis are available upon request.

ACKNOWLEDGEMENTS

We thank D. Robles-Espinoza, M. del Castillo, N. Geisler, M. Rashid, K. Wong, I. Martincorena, L. Alexandrov, V. Iyer, C. Bradshaw, T. Keane and members of the S.P.J. laboratory for advice and discussion.

Author contributions: The project was conceived by S.P.J., D.A., F.P., I.S. and J.W.. Manual and automatic propagations were carried out by M.H. and E.A.-P. respectively. DNA extractions were carried out by M.H. Data analysis was carried out by F.P. and M.H. The manuscript was written by F.P. and S.P.J., with contributions from J.W., M.H., D.A. and I.S.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Wellcome Strategic Award [101126/Z/13/Z (COMSIG)]; Wellcome Investigator Award [206388/Z/17/Z]; Wellcome PhD Fellowship [098051 to M.H.]; Cancer Research UK Programme Grant [C6/A18796]; Cancer Research UK [C6946/A24843, G2244/A21717]; Swedish Research Council [2014-6547, 2014-4605], Wellcome [WT203144] Institute Core Funding. Funding for open access charge: Wellcome Trust, CRUK, University of Cambridge.

Conflict of interest statement. None declared.

Notes

Present address: Israel Salguero, Centro Nacional de Investigaciones Oncológicas, Melchor Fernández Almagro, 328029 Madrid, España.

REFERENCES

1. 

Campbell B.B. , LightN., FabrizioD., ZatzmanM., FuligniF., de BorjaR., DavidsonS., EdwardsM., ElvinJ.A., HodelK.P.et al. Comprehensive analysis of hypermutation in human cancer. Cell. 2017; 171:10421056.

2. 

Alexandrov L.B. , Nik-ZainalS., WedgeD.C., AparicioS.A.J.R., BehjatiS., BiankinA.V., BignellG.R., BolliN., BorgÅ., Børresen-DaleA.-L.et al. Signatures of mutational processes in human cancer. Nature. 2013; 500:415421.

3. 

Eshleman J.R. , MarkowitzS.D. Mismatch repair defects in human carcinogenesis. Hum. Mol. Genet.1996; 5:14891494.

4. 

Palles C. , CazierJ.-B., HowarthK.M., DomingoE., JonesA.M., BroderickP., KempZ., SpainS.L., GuarinoE., Guarino AlmeidaE.et al. Germline mutations affecting the proofreading domains of POLE and POLD1 predispose to colorectal adenomas and carcinomas. Nat. Genet.2013; 45:136144.

5. 

Church D.N. , BriggsS.E.W., PallesC., DomingoE., KearseyS.J., GrimesJ.M., GormanM., MartinL., HowarthK.M., HodgsonS.V.et al. DNA polymerase ϵ and δ exonuclease domain mutations in endometrial cancer. Hum. Mol. Genet.2013; 22:28202828.

6. 

Cancer Genome Atlas Research Network Kandoth C. , SchultzN., CherniackA.D., AkbaniR., LiuY., ShenH., RobertsonA.G., PashtanI., ShenR.et al. Integrated genomic characterization of endometrial carcinoma. Nature. 2013; 497:6773.

7. 

Brutlag D. , KornbergA. Enzymatic synthesis of deoxyribonucleic acid. 36. A proofreading function for the 3′ leads to 5′ exonuclease activity in deoxyribonucleic acid polymerases. J. Biol. Chem.1972; 247:241248.

8. 

Khare V. , EckertK.A. The 3′ → 5′ exonuclease of T4 DNA polymerase removes premutagenic alkyl mispairs and contributes to futile cycling at O6-methylguanine lesions. J. Biol. Chem.2001; 276:2428624292.

9. 

Bernad A. , BlancoL., LázaroJ.M., MartínG., SalasM. A conserved 3′—-5′ exonuclease active site in prokaryotic and eukaryotic DNA polymerases. Cell. 1989; 59:219228.

10. 

Hogg M. , OstermanP., BylundG.O., GanaiR.A., LundströmE.-B., Sauer-ErikssonA.E., JohanssonE. Structural basis for processive DNA synthesis by yeast DNA polymerase ϵ. Nat. Struct. Mol. Biol.2014; 21:4955.

11. 

Jain R. , RajashankarK.R., BukuA., JohnsonR.E., PrakashL., PrakashS., AggarwalA.K. Crystal structure of yeast DNA polymerase ϵ catalytic domain. PLoS One. 2014; 9:e94835.

12. 

Reddy M.K. , WeitzelS.E., Hippelvon, P.H. Processive proofreading is intrinsic to T4 DNA polymerase. J. Biol. Chem.1992; 267:1415714166.

13. 

Capson T.L. , PeliskaJ.A., KaboordB.F., FreyM.W., LivelyC., DahlbergM., BenkovicS.J. Kinetic characterization of the polymerase and exonuclease activities of the gene 43 protein of bacteriophage T4. Biochemistry. 1992; 31:1098410994.

14. 

Kane D.P. , ShcherbakovaP.V. A common cancer-associated DNA polymerase ϵ mutation causes an exceptionally strong mutator phenotype, indicating fidelity defects distinct from loss of proofreading. Cancer Res.2014; 74:18951901.

15. 

Morrison A. , BellJ.B., KunkelT.A., SuginoA. Eukaryotic DNA polymerase amino acid sequence required for 3′—-5′ exonuclease activity. Proc. Natl. Acad. Sci. U.S.A.1991; 88:94739477.

16. 

Xing X. , KaneD.P., BulockC.R., MooreE.A., SharmaS., ChabesA., ShcherbakovaP.V. A recurrent cancer-associated substitution in DNA polymerase ϵ produces a hyperactive enzyme. Nat. Commun.2019; 10:374.

17. 

Parkash V. , KulkarniY., BeekTer, J., ShcherbakovaP.V., KamerlinS.C.L., JohanssonE. Structural consequence of the most frequently recurring cancer-associated substitution in DNA polymerase ϵ. Nat. Commun.2019; 10:373.

18. 

Putnam C.D. , KolodnerR.D. Determination of gross chromosomal rearrangement rates. Cold Spring Harb. Protoc.2010; 2010:doi:10.1101/pdb.prot5492.

19. 

Zackrisson M. , HallinJ., OttossonL.-G., DahlP., Fernandez-ParadaE., LändströmE., Fernandez-RicaudL., KaferleP., SkymanA., StenbergS.et al. Scan-o-matic: High-Resolution microbial phenomics at a massive scale. G3 (Bethesda). 2016; 6:30033014.

20. 

Puddu F. , HerzogM., SelivanovaA., WangS., ZhuJ., Klein-LaviS., GordonM., MeirmanR., Millan-ZambranoG., AyestaranI.X.I.et al. Genome architecture and stability in the Saccharomyces cerevisiae knockout collection. Nature. 2019; 573:416420.

21. 

Alexandrov L.B. , Nik-ZainalS., WedgeD.C., CampbellP.J., StrattonM.R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep.2013; 3:246259.

22. 

Blokzijl F. , JanssenR., van BoxtelR., CuppenE. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 2018; 10:3311.

23. 

Wagih O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics. 2017; 33:36453647.

24. 

Morrison A. , JohnsonA.L., JohnstonL.H., SuginoA. Pathway correcting DNA replication errors in Saccharomyces cerevisiae. EMBO J.1993; 12:14671473.

25. 

Zhu Y.O. , SiegalM.L., HallD.W., PetrovD.A. Precise estimates of mutation rate and spectrum in yeast. PNAS. 2014; 111:E2310E2318.

26. 

Lynch M. , SungW., MorrisK., CoffeyN., LandryC.R., DopmanE.B., DickinsonW.J., OkamotoK., KulkarniS., HartlD.L.et al. A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc. Natl. Acad. Sci. U.S.A.2008; 105:92729277.

27. 

Nishant K.T. , WeiW., ManceraE., ArguesoJ.L., SchlattlA., DelhommeN., MaX., BustamanteC.D., KorbelJ.O., GuZ.et al. The baker's yeast diploid genome is remarkably stable in vegetative growth and meiosis. PLoS Genet.2010; 6:e1001109.

28. 

Zwolshen J.H. , BhattacharjeeJ.K. Genetic and biochemical properties of thialysine-resistant mutants of Saccharomyces cerevisiae. J. Gen. Microbiol.1981; 122:281287.

29. 

Flood C.L. , RodriguezG.P., BaoG., ShockleyA.H., KowY.W., CrouseG.F. Replicative DNA polymerase δ but not ϵ proofreads errors in Cis and in Trans. PLos Genet.2015; 11:e1005049.

30. 

Bulock C.R. , XingX., ShcherbakovaP.V. DNA polymerase δ proofreads errors made by DNA polymerase ϵ. Proc. Natl. Acad. Sci. U.S.A.2020; 117:60356041.

31. 

Barbari S.R. , KaneD.P., MooreE.A., ShcherbakovaP.V. Functional analysis of cancer-associated DNA polymerase ϵ variants in Saccharomyces cerevisiae. G3 (Bethesda). 2018; 8:10191029.

32. 

Williams L.N. , HerrA.J., PrestonB.D. Emergence of DNA polymerase ϵ antimutators that escape error-induced extinction in yeast. Genetics. 2013; 193:751770.

33. 

Tracy M.A. , LeeM.B., HearnB.L., DowsettI.T., ThurberL.C., LooJ., LoebA.M., PrestonK., TuncelM.I., GhodsianN.et al. Spontaneous polyploids and antimutators compete during the evolution of saccharomyces cerevisiae mutator cells. Genetics. 2020; 215:959974.

34. 

Galati M.A. , HodelK.P., GamsM.S., SudhamanS., BridgeT., ZahurancikW.J., UngerleiderN.A., ParkV.S., ErcanA.B., JoksimovicL.et al. Cancers from novel pole-mutant mouse models provide insights into polymerase-mediated hypermutagenesis and immune checkpoint blockade. Cancer Res.2020; 80:56065618.

35. 

Li H.-D. , CuevasI., ZhangM., LuC., AlamM.M., FuY.-X., YouM.J., AkbayE.A., ZhangH., CastrillonD.H. Polymerase-mediated ultramutagenesis in mice produces diverse cancers with high mutational load. J. Clin. Invest.2018; 128:41794191.

36. 

Hodel K.P. , SunM.J.S., UngerleiderN., ParkV.S., WilliamsL.G., BauerD.L., ImmethunV.E., WangJ., SuoZ., LuH.et al. POLE mutation spectra are shaped by the mutant allele identity, its abundance, and mismatch repair status. Mol. Cell. 2020; 78:11661177.

37. 

Pursell Z.F. , IsozI., LundströmE.-B., JohanssonE., KunkelT.A. Yeast DNA polymerase epsilon participates in leading-strand DNA replication. Science (New York, N.Y.). 2007; 317:127130.

38. 

Nick McElhinny S.A. , GordeninD.A., StithC.M., BurgersP.M.J., KunkelT.A. Division of labor at the eukaryotic replication fork. Mol. Cell. 2008; 30:137144.

39. 

Marsischky G.T. , KolodnerR.D. Biochemical characterization of the interaction between the Saccharomyces cerevisiae MSH2-MSH6 complex and mispaired bases in DNA. J. Biol. Chem.1999; 274:2666826682.

40. 

Lang G.I. , ParsonsL., GammieA.E. Mutation rates, spectra, and genome-wide distribution of spontaneous mutations in mismatch repair deficient yeast. G3 (Bethesda). 2013; 3:14531465.

41. 

Morrison A. , SuginoA. The 3′→5′ exonucleases of both DNA polymerases delta and epsilon participate in correcting errors of DNA replication in Saccharomyces cerevisiae. Mol. Gen. Genet.1994; 242:289296.

42. 

Li H.-D. , LuC., ZhangH., HuQ., ZhangJ., CuevasI.C., SahooS.S., AguilarM., MauraisE.G., ZhangS.et al. A PoleP286R mouse model of endometrial cancer recapitulates high mutational burden and immunotherapy response. JCI Insight. 2020; 5:2311.

43. 

Tate J.G. , BamfordS., JubbH.C., SondkaZ., BeareD.M., BindalN., BoutselakisH., ColeC.G., CreatoreC., DawsonE.et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res.2019; 47:D941D947.

44. 

Haradhvala N.J. , KimJ., MaruvkaY.E., PolakP., RosebrockD., LivitzD., HessJ.M., LeshchinerI., KamburovA., MouwK.W.et al. Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat. Commun.2018; 9:17461749.

45. 

Lujan S.A. , ClausenA.R., ClarkA.B., MacAlpineH.K., MacAlpineD.M., MalcE.P., MieczkowskiP.A., BurkholderA.B., FargoD.C., GordeninD.A.et al. Heterogeneous polymerase fidelity and mismatch repair bias genome variation and composition. Genome Res.2014; 24:17511764.

46. 

Nick McElhinny S.A. , KumarD., ClarkA.B., WattD.L., WattsB.E., LundströmE.-B., JohanssonE., ChabesA., KunkelT.A. Genome instability due to ribonucleotide incorporation into DNA. Nat. Chem. Biol.2010; 6:774781.

47. 

Meng X. , ZhouY., ZhangS., LeeE.Y.C., FrickD.N., LeeM.Y.W.T DNA damage alters DNA polymerase delta to a form that exhibits increased discrimination against modified template bases and mismatched primers. Nucleic Acids Res.2009; 37:647657.

48. 

Hirota K. , YoshikiyoK., GuilbaudG., TsurimotoT., MuraiJ., TsudaM., PhillipsL.G., NaritaT., NishiharaK., KobayashiK.et al. The POLD3 subunit of DNA polymerase δ can promote translesion synthesis independently of DNA polymerase ζ. Nucleic Acids Res.2015; 43:16711683.

49. 

Kunkel T.A. , SoniA. Mutagenesis by transient misalignment. J. Biol. Chem.1988; 263:1478414789.

50. 

Bulock C.R. , XingX., ShcherbakovaP.V. Mismatch repair and DNA polymerase δ proofreading prevent catastrophic accumulation of leading strand errors in cells expressing a cancer-associated DNA polymerase ε variant. Nucleic Acids Res.2020; 48:91249134.

51. 

Williams L.N. , MarjavaaraL., KnowelsG.M., SchultzE.M., FoxE.J., ChabesA., HerrA.J. dNTP pool levels modulate mutator phenotypes of error-prone DNA polymerase ϵ variants. Proc. Natl. Acad. Sci. U.S.A.2015; 112:E2457E2466.

52. 

Garbacz M.A. , CoxP.B., SharmaS., LujanS.A., ChabesA., KunkelT.A. The absence of the catalytic domains of Saccharomyces cerevisiae DNA polymerase ε strongly reduces DNA replication fidelity. Nucleic Acids Res.2019; 47:39863995.