Nucleic Acids Research

Home Animal-APAdb: a comprehensive animal alternative polyadenylation database

Animal-APAdb: a comprehensive animal alternative polyadenylation database

Weiwei Jin, Qizhao Zhu, Yanbo Yang, Wenqian Yang, Dongyang Wang, Jiajun Yang, Xiaohui Niu, Debing Yu, Jing Gong

The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

https://doi.org/10.1093/nar/gkaa778, Volume: 49, Issue: D1, Pages: 1-8

Article Type: Research Article Article History

Publisher: Oxford University Press

Altmetric

Table of Contents

INTRODUCTION
MATERIALS AND METHODS
IMPLEMENTATION
DATABASE CONTENT AND USAGE
SUMMARY AND FUTURE DIRECTIONS
FUNDING

Abstract

Alternative polyadenylation (APA) is an important post-transcriptional regulatory mechanism that recognizes different polyadenylation signals on transcripts, resulting in transcripts with different lengths of 3′ untranslated regions and thereby influencing a series of biological processes. Recent studies have highlighted the important roles of APA in human. However, APA profiles in other animals have not been fully recognized, and there is no database that provides comprehensive APA information for other animals except human. Here, by using the RNA sequencing data collected from public databases, we systematically characterized the APA profiles in 9244 samples of 18 species. In total, we identified 342 952 APA events with a median of 17 020 per species using the DaPars2 algorithm, and 315 691 APA events with a median of 17 953 per species using the QAPA algorithm in these 18 species, respectively. In addition, we predicted the polyadenylation sites (PAS) and motifs near PAS of these species. We further developed Animal-APAdb, a user-friendly database (http://gong_lab.hzau.edu.cn/Animal-APAdb/) for data searching, browsing and downloading. With comprehensive information of APA events in different tissues of different species, Animal-APAdb may greatly facilitate the exploration of animal APA patterns and novel mechanisms, gene expression regulation and APA evolution across tissues and species.

Jin,Zhu,Yang,Yang,Wang,Yang,Niu,Yu,and Gong: Animal-APAdb: a comprehensive animal alternative polyadenylation database

INTRODUCTION

Alternative polyadenylation (APA) is a widespread mechanism that contributes to the generation of transcript isoforms with different lengths of 3′ untranslated regions (3′UTR) by recognizing different polyadenylation signals (1), which may cause the alteration of some important regulatory elements, such as miRNA binding sites and RNA protein binding sites, thus affecting mRNA stability, localization and translation (2,3). It has been revealed that approximately 70% of eukaryotic genes possess multiple functional polyadenylation sites (PAS) (3–6) and nearly half of genes in fruitfly (7), worm (8) and zebrafish (9) undergo APA. APA-mediated gene regulation functions in a tissue-specific (3,10), and cell-specific manner (11,12). For example, brain and neuronal cells tend to have longer 3′UTRs than testis and ovary cells (13,14). Global 3′UTR shortening has been found in proliferating cells, cancer cells and tumor samples (13,15–17), whereas 3′UTR lengthening is associated with embryonic differentiation (16) and animal neurogenesis (18). Recent studies have highlighted the important roles of APA in human. Several APA dysregulations have been identified in human diseases (6–9), such as diabetic nephropathy, systemic lupus erythematosus and muscular dystrophy (19). However, the scope for gene regulation at the level of cleavage and polyadenylation in other animals except human has not been well recognized.

Several methods have been developed to identify PAS and quantify APA events (1,20–23). Compared with early APA identification methods based on complementary DNAs, expressed sequence tags and 3′-sequencing data, which can only detect limited APA events, RNA sequencing (RNA-seq) has become an alternative technology for detecting APA events at the genome level (24–26). Accordingly, several algorithms have been developed for the identification of APA events from RNA-seq data, either based on de novo identification algorithms including IsoSCM (27), DaPars (15,28), APAtrap (29) and TAPAS (30) or annotation-based algorithms such as MISO (31), roar (32) and QAPA (33). In human, TC3A (34) and APAatlas (24) databases systematically characterize APA events in different tissues using a large amount of RNA-seq data from The Cancer Genome Atlas and Genotype-Tissue Expression project, respectively. However, there is no database that provides comprehensive APA information for other animals except human in a large number of tissues.

In this study, we systematically characterized APA profiles in 9244 samples of 18 species using the RNA-seq data collected from public databases. These species include baboon, chicken, chimp, clawed frog, cow, crab-eating macaque, dog, fruitfly, green monkey, horse, mouse, pig, rabbit, rat, rhesus, sheep, worm and zebrafish. In addition, we predicted the PAS and motifs near PAS (APA motifs) of these species. Finally, we further developed Animal-APAdb (http://gong_lab.hzau.edu.cn/Animal-APAdb/), a user-friendly database for the browsing, searching and downloading of APA-related information.

MATERIALS AND METHODS

Collection and processing of RNA-seq data

To obtain a complete list of RNA-seq data of other animals except human, we conducted a comprehensive search from the Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra) (35,36) of the National Center for Biotechnology Information. The following search terms were used for SRA searching on 10 December 2019: (‘cdna’[Selection]) AND ‘transcriptomic’[Source]) AND (‘rna seq’[Strategy]) NOT (‘human’[Organism]) NOT (‘single cell’[Text Word]), and a total of 443 318 records were obtained. Because certain files are required in the quantification of APA events, including gene_bed.file, gene_symbol_file, ensembl_identifiers.txt, gencode.basic.txt, genome.fa and genome.annotation.gtf from UCSC (https://genome.ucsc.edu/) (37) and Ensembl (http://www.ensembl.org) (38), the candidate species were screened based on the availability of the required files and the ranking of their sample sizes. As a result, a total of 18 species were selected for further study. Then, we extracted all BioProjects from the sample list and checked each description manually. Finally, 106 BioProjects of normal tissues were retained. The non-repetitive raw RNA-seq data from these BioProjects were downloaded, converted into standard fastq, subjected to quality control using FastQC (version: v0.11.8), cleaned with Trim Galore (version: 0.6.4_dev) and then aligned to the corresponding reference genome using HISAT2 (39). Subsequently, samples and BioProjects with low mapping rates were discarded, and finally 9244 samples of 97 BioProjects were retained (Figure 1A).

Figure 1.

Flow chart of Animal-APAdb. (A) Data collection. (B) Data processing. (C) Storage and display structures of Animal-APAdb.

Identification of PAS and PAS cluster

Recent studies have demonstrated the possibility of using denovo algorithms to identify novel PAS based on RNA-seq data (15,28). Here, we used the well-established de novo algorithm DaPars2 (15) to identify the alternative proximal PAS within each sample. Based on the two-PAS model, DaPars2 applies a linear regression model to infer the location of the APA site within the 3′UTR region. Considering that the position of PAS predicted by DaPars2 might be inconsistent among different samples, the sites were grouped into a cluster based on the principle of the site position distance ≤24 nt (Figure 1B) (40,41). For a gene, the median position of a PAS cluster is usually the most representative site among samples, so the median site was defined as the PAS.

Identification of alternative polyadenylation

In this study, we utilized two popular algorithms, DaPars2 and QAPA, to quantify APA events from standard RNA-seq data. DaPars2 only predicts single proximal site, and the end of 3′UTR was taken as the distal site by default, so we used the percentage of the distal poly(A) site usage index (PDUI) to quantify APA events. PDUI value was a novel, intuitive ratio for quantifying APA events based on RNA-seq data (28), which was calculated by the expression level of isoform with the distal poly(A) site, divided by the total expression level of isoforms with both distal and proximal poly(A) sites. To reduce false positives, we discarded the PDUIs of certain transcripts for which the coverage of the last exon <30× or the percentage of samples supporting this PAS cluster (SampleP) < 5% (Figure 1B) (24,28). For QAPA, which is based on transcript-level abundance, it can calculate the relative proportion of each isoform in a gene using the PAS annotation files from GENCODE basic poly(A) annotation track, PolyASite (42) and/or custom file, so we used Poly(A) Usage (PAU) to quantify APA events. Due to the lack of PAS annotation files for most animals, we first created PAS annotation files based on the PAS extracted from DaPars2 results with the SampleP ≥ 5% (Figure 1B). Since mouse and worm have PAS annotation files in PolyASite database, these PAS annotation files were merged with our PAS annotation files for QAPA calculation.

Identification of APA motifs

Polyadenylation is the result of an RNA processing reaction. In the polyadenylation process, a multiprotein complex assembles on specific sequences of the pre-mRNAs, which are called the cleavage and polyadenylation signals (pA signals) (43). pA signals are composed of sequences that flank either side of where the pre-mRNA is endonucleolytically cleaved and subsequently polyadenylated (43). The classic pA signal is a bipartite sequence element that usually consists of a PAS hexamer, as well as upstream and downstream motifs of the cleavage site. In this study, we scanned the 50 nt (1,26) upstream sequence of the PAS to find PAS hexamers by DREME (44). In addition, for each PAS, motifs respectively at 200 nt upstream and downstream (1) from the PAS were obtained using MEME (45). Motifs were further filtered based on the following conditions: the statistical significance of the motif (E-value) > 0.05, the percentage of sites contributing to the construction of the motif (CountP) < 5% for MEME, or the percentage of sequences matching the motif (CountP) < 5% for DREME (Figure 1B).

IMPLEMENTATION

Animal-APAdb (http://gong_lab.hzau.edu.cn/Animal-APAdb/) was built based on the THINKPHP (version 5.0.24) framework and Bootstrap 4, and runs on the Apache 2 web server with MySQL (version 5.7.29) as its database engine and Highcharts for graph drawing (Figure 1C). Animal-APAdb is available online without registration and optimized for Chrome (recommended), Internet Explorer, Opera, Firefox, Windows Edge and macOS Safari.

DATABASE CONTENT AND USAGE

Samples of 18 species in Animal-APAdb

In total, 9244 samples of 18 species were analyzed in Animal-APAdb, ranging from 87 samples in Crab-eating macaque to 1235 samples in mouse (Table 1). The detailed information, including the number of samples per species, reference genome versions and the number of APA events, is available on the ‘Document’ page. The sample information of each species is presented in the ‘BioProjects of each species’ module on the ‘Document’ page, including species, the ID of BioProject, library layout, sample size and breed.

Table 1.

Data summary in Animal-APAdb

Species	No. of samples	APA events identified by DaPars2	APA events identified by QAPA	Identified PAS	Genes with multiple PAS (%)
Papio anubis (Baboon)	766	2657	2401	11 694	21.73
Gallus gallus (Chicken)	656	25 600	20 680	72 508	60.06
Pan troglodytes (Chimp)	262	11 524	14 447	41 063	31.66
Xenopus tropicalis (Clawed frog)	284	19 782	18 164	49 648	59.07
Bos taurus (Cow)	838	17 203	17 741	68 535	55.41
Macaca fascicularis (Crab-eating macaque)	87	29 269	26 956	60 990	54.83
Canis lupus familiaris (Dog)	292	16 837	14 066	55 969	47.21
Drosophila melanogaster (Fruitfly)	774	7332	8572	34 261	52.12
Chlorocebus sabaeus (Green monkey)	327	13 922	14 645	43 972	41.18
Equus caballus (Horse)	160	11 149	7186	36 641	44.59
Mus musculus (Mouse)	1235	54 448	53 710	166 132	43.26
Sus scrofa (Pig)	819	36 280	24 441	160 005	61.41
Oryctolagus cuniculus (Rabbit)	338	7687	7442	22 165	40.88
Rattus norvegicus (Rat)	901	19 605	20 378	74 092	47.71
Macaca mulatta (Rhesus)	257	29 138	20 625	82 352	39.65
Ovis aries (Sheep)	730	7029	4189	26 652	32.27
Caenorhabditis elegans (Worm)	319	17 218	18 459	28 830	24.79
Danio rerio (Zebrafish)	199	16 272	21 589	46 991	40.31
Sum	9244	342 952	315 691	1 082 500	-
Max	1235	54 448	53 710	166 132	61.41
Min	87	2657	2401	11 694	21.73
Median	333	17 020	17 953	48 320	43.93

APA events in Animal-APAdb

Considering that de novo identification may introduce some false positives, part of the results was filtered as aforementioned. Finally, we identified a total of 342 952 APA events (median: 17 020 per species) using the DaPars2 algorithm, and 315 691 APA events (median: 17 953 per species) using the QAPA algorithm in these 18 species, respectively. The summary of these APA events is shown in ‘APA event summary’ module on the ‘Document’ page and Table 1.

PAS in Animal-APAdb

By using DaPars2, we identified a total of 1 082 500 PAS in these species, ranging from 11 694 in baboon to 166 132 in mouse. About 44% genes have multiple PAS, ranging from 22% in baboon to 61% in pig. We found that the 3′UTR length of genes (median: 773 nt) with multiple PAS is obviously longer than that of genes (median: 149 nt) with single PAS among all species (Figure 2A). We then calculated the number of occurrences of classic polyadenylation signal AATAAA and its 1 nt variants at upstream 50 nt from PAS (1,26,46), and found that about 18% PAS having these signals, which is similar to the percentage of 15% reported in another study (26).

Figure 2.

Some results of PAS and APA motifs. (A) 3′UTR length differences between single PAS genes and multi PAS genes. (B) A case of motifs at upstream 200 nt. (C) A case of motifs at downstream 200 nt.

APA motifs in Animal-APAdb

By using the MEME, DREME tool and the threshold value mentioned above, we obtained a total of 336 valid motifs, including 154 PAS hexamers, 90 motifs at 200 nt upstream, 92 motifs at 200 nt downstream. Among these PAS hexamers, the most frequent motifs are GGAGGA and TGTAAA, which are presented in 11 species, followed by GGAAGA, TGTATA and AGAAGA. It is actually difficult to determine the differences or similarities between motifs generated from MEME tool due to their different lengths. However, some similar short sequences could still be found from motifs in different species here, such as GAGGAAGA, CTGCTG and their variants at upstream 200 nt (Figure 2B), and A-rich sequence, CTGCAG and their variants at downstream 200 nt (Figure 2C).

Web interface

Animal-APAdb provides a user-friendly web interface. Four main modules, including ‘APA Events’, ‘PAS’, ‘APA Motifs’ and ‘Download’ (Figure 3A), are provided for the users to query APA events of genes in the tissues of certain species, retrieve PAS in the gene/genomic region of interests, browse probable APA motifs and download corresponding datasets.

Figure 3.

Overview of the Animal-APAdb. (A) The main functions in Animal-APAdb, including ‘APA Events’, ‘PAS’, ‘APA Motifs’ and ‘Download’ modules. (B) A table with species, tissue, gene symbol, Ensembl ID and Ensembl Trans ID of queried APA events. (C) The PAS graph of the queried gene. (D) The box-plot graph of APA events of the queried gene. (E) An example of search results in the ‘PAS’ module. (F) A case in the ‘APA Motifs’ module.

On the ‘APA Events’ page, the users can query APA events by selecting an algorithm, species and tissue and typing a gene symbol or Ensembl gene ID in the search box. A table with the species, tissue, gene symbol, Ensembl ID and Ensembl Transcript ID of the queried APA events will be shown (Figure 3B). Then, by clicking the ‘Plot’ button, the users can view the position graph (Figure 3C) including the range of 3′UTR of the gene, the position of PAS and a box-plot graph of APA events (Figure 3D). It is worth noting that QAPA can calculate the usage of multiple sites (the distal site may be different from the end of 3′UTR) by PAS annotation file. Hence the users need to click the point on the position graph to retrieve the box-plot graph if they selected QAPA algorithm.

On the ‘PAS’ page, the users can select a species and input a genomic region (e.g. chr1:1–2000000:+), gene symbol or Ensembl ID to query the PAS clusters. Then, a table will be presented to provide details of the cluster with gene symbol, Ensembl ID, site ID, 3′UTR, PAS cluster, all PAS in the cluster (PAS ClusterS), PAS, the percentage (SampleP) and number (SampleS) of samples that support this PAS cluster and signals (Figure 3E). The users can click the ‘Download’ button to download the queried data, or click the ‘?’ button for more information.

On the ‘APA Motifs’ page, when the users select the species and motif location, a table with species, motif location, motif, CountP and E-value will be provided, and more detailed reports can be obtained by clicking the ‘More Detail’ button (Figure 3F).

In Animal-APAdb, the main datasets of tissues for each species can be freely available from the ‘Download’ page. The ‘Document’ page provides the sample information, reference genome versions, APA event summary, pipeline of database construction and some other information. Besides, Animal-APAdb welcomes any feedback with email address provided on the ‘Contact’ page.

SUMMARY AND FUTURE DIRECTIONS

Great progress has been achieved in animal genome research in recent decades. Several animal-related databases, such as AnimalQTLdb (47) and Animal-ImputeDB (48), have been widely used by researchers. However, there are still big gaps in the research on the mechanisms and functions of APA in other animals except human. In this study, we developed the Animal-APAdb by collecting public available data, and provided comprehensive APA information of different tissues in 18 species. To the best of our knowledge, Animal-APAdb is the largest and most comprehensive animal APA database to date. In this version of Animal-APAdb, by using the data of 9244 samples, numerous PAS in multiple species are provided, and large amounts of APA events in different tissues and probable APA motifs are identified. In the future, we will integrate more samples and species into Animal-APAdb and continue to update the database. With comprehensive APA information in various tissues of different species, we believe that Animal-APAdb will be useful for uncovering animal APA patterns and novel mechanisms, gene expression regulation and APA evolution across tissues and species.

FUNDING

National Natural Science Foundation of China [31970644 to J.G.]; Huazhong Agricultural University Scientific & Technological Self-innovation Foundation [11041810351 to J.G.]; Jiangsu Agricultural Science and Technology Independent Innovation Fund [CX (17) 3014 to D.B.Y.]; Fundamental Research Funds for the Central University (Huazhong Agricultural University) [2662017JC048 to X.H.N.]. Funding for open access charge: Jiangsu Agricultural Science and Technology Independent Innovation Fund [CX (17) 3014 to D.B.Y.].

Conflict of interest statement. None declared.

REFERENCES

Gruber

A.J.

, Schmidt

, Gruber

A.R.

, Martin

, Ghosh

, Belmadani

, Keller

, Zavolan

A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation. Genome Res.2016; 26:1145–1159.

Elkon

, Ugalde

A.P.

, Agami

Alternative cleavage and polyadenylation: extent, regulation and function. Nat. Rev. Genet.2013; 14:496–506.

Tian

, Manley

J.L.

Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol.2017; 18:18–30.

Hoque

, Ji

, Zheng

, Luo

, Li

, You

, Park

J.Y.

, Yehia

, Tian

Analysis of alternative cleavage and polyadenylation by 3′ region extraction and deep sequencing. Nat. Methods. 2013; 10:133–139.

, Bartel

D.P.

Widespread influence of 3′-end structures on mammalian mRNA processing and stability. Cell. 2017; 169:905–917.

Mayr

Evolution and biological roles of alternative 3′UTRs. Trends Cell Biol.2016; 26:227–237.

Smibert

, Miura

, Westholm

J.O.

, Shenker

, May

, Duff

M.O.

, Zhang

, Eads

B.D.

, Carlson

, Brown

J.B.

et al.

Global patterns of tissue-specific alternative polyadenylation in Drosophila. Cell Rep.2012; 1:277–289.

Jan

C.H.

, Friedman

R.C.

, Ruby

J.G.

, Bartel

D.P.

Formation, regulation and evolution of Caenorhabditis elegans 3′UTRs. Nature. 2011; 469:97–101.

Ulitsky

, Shkumatava

, Jan

C.H.

, Subtelny

A.O.

, Koppstein

, Bell

G.W.

, Sive

, Bartel

D.P.

Extensive alternative polyadenylation during zebrafish development. Genome Res.2012; 22:2054–2066.

10.

Lianoglou

, Garg

, Yang

J.L.

, Leslie

C.S.

, Mayr

Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression. Genes Dev.2013; 27:2380–2396.

11.

MacDonald

C.C.

Tissue-specific mechanisms of alternative polyadenylation: testis, brain, and beyond (2018 update). Wiley Interdiscip. Rev. RNA. 2019; 10:e1526.

12.

Di Giammartino

D.C.

, Nishida

, Manley

J.L.

Mechanisms and consequences of alternative polyadenylation. Mol. Cell. 2011; 43:853–866.

13.

Sandberg

, Neilson

J.R.

, Sarma

, Sharp

P.A.

, Burge

C.B.

Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer microRNA target sites. Science. 2008; 320:1643–1647.

14.

Guvenek

, Tian

Analysis of alternative cleavage and polyadenylation in mature and differentiating neurons using RNA-seq data. Quant. Biol.2018; 6:253–266.

15.

Xiang

, Ye

, Lou

, Yang

, Cai

, Zhang

, Mills

, Chen

N.Y.

, Kim

, Muge Ozguc

et al.

Comprehensive characterization of alternative polyadenylation in human cancer. J. Natl. Cancer Inst.2018; 110:379–389.

16.

, Lee

J.Y.

, Pan

, Jiang

, Tian

Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc. Natl. Acad. Sci. U.S.A.2009; 106:7028–7033.

17.

Mayr

, Bartel

D.P.

Widespread shortening of 3′UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell. 2009; 138:673–684.

18.

Miura

, Shenker

, Andreu-Agullo

, Westholm

J.O.

, Lai

E.C.

Widespread and extensive lengthening of 3′ UTRs in the mammalian brain. Genome Res.2013; 23:812–825.

19.

Chang

J.W.

, Yeh

H.S.

, Yong

Alternative polyadenylation in human diseases. Endocrinol. Metab.2017; 32:413–421.

20.

Wang

, Nambiar

, Zheng

, Tian

PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes. Nucleic Acids Res.2018; 46:D315–D319.

21.

You

, Wu

, Feng

, Fu

, Guo

, Long

, Zhang

, Luan

, Tian

, Chen

et al.

APASdb: a database describing alternative poly(A) sites and selection of heterogeneous cleavage sites downstream of poly(A) signals. Nucleic Acids Res.2015; 43:D59–D67.

22.

Zhang

, Hu

, Recce

, Tian

PolyA_DB: a database for mammalian mRNA polyadenylation. Nucleic Acids Res.2005; 33:D116–D120.

23.

Lee

J.Y.

, Yeh

, Park

J.Y.

, Tian

PolyA_DB 2: mRNA polyadenylation sites in vertebrate genes. Nucleic Acids Res.2007; 35:D165–D168.

24.

Hong

, Ruan

, Zhang

, Ye

, Liu

, Li

, Jing

, Zhang

, Diao

, Liang

et al.

APAatlas: decoding alternative polyadenylation across human tissues. Nucleic Acids Res.2020; 48:D34–D39.

25.

Bonfert

, Friedel

C.C.

Prediction of Poly(A) sites by Poly(A) read mapping. PLoS One. 2017; 12:e0170914.

26.

Chen

, Ji

, Fu

, Lin

, Ye

, Su

, Wu

A survey on identification and quantification of alternative polyadenylation sites from RNA-seq data. Brief. Bioinform.2019; 21:1261–1276.

27.

Shenker

, Miura

, Sanfilippo

, Lai

E.C.

IsoSCM: improved and alternative 3′ UTR annotation using multiple change-point inference. RNA. 2015; 21:14–27.

28.

Xia

, Donehower

L.A.

, Cooper

T.A.

, Neilson

J.R.

, Wheeler

D.A.

, Wagner

E.J.

, Li

Dynamic analyses of alternative polyadenylation from RNA-seq reveal a 3′-UTR landscape across seven tumour types. Nat. Commun.2014; 5:5274.

29.

, Long

, Ji

, Li

Q.Q.

, Wu

APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data. Bioinformatics. 2018; 34:1841–1849.

30.

Arefeen

, Liu

, Xiao

, Jiang

TAPAS: tool for alternative polyadenylation site analysis. Bioinformatics. 2018; 34:2521–2529.

31.

Katz

, Wang

E.T.

, Airoldi

E.M.

, Burge

C.B.

Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods. 2010; 7:1009–1015.

32.

Grassi

, Mariella

, Lembo

, Molineris

, Provero

Roar: detecting alternative polyadenylation with standard mRNA sequencing libraries. BMC Bioinformatics. 2016; 17:423.

33.

K.C.H.

, Blencowe

B.J.

, Morris

QAPA: a new method for the systematic analysis of alternative polyadenylation from RNA-seq data. Genome Biol.2018; 19:45.

34.

Feng

, Li

, Wagner

E.J.

, Li

TC3A: the Cancer 3′ UTR Atlas. Nucleic Acids Res.2018; 46:D1027–D1030.

35.

Kodama

, Shumway

, Leinonen

International Nucleotide Sequence Database, C.

The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res.2012; 40:D54–D56.

36.

Sayers

E.W.

, Beck

, Brister

J.R.

, Bolton

E.E.

, Canese

, Comeau

D.C.

, Funk

, Ketter

, Kim

, Kimchi

et al.

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res.2020; 48:D9–D16.

37.

Lee

C.M.

, Barber

G.P.

, Casper

, Clawson

, Diekhans

, Gonzalez

J.N.

, Hinrichs

A.S.

, Lee

B.T.

, Nassar

L.R.

, Powell

C.C.

et al.

UCSC Genome Browser enters 20th year. Nucleic Acids Res.2020; 48:D756–D761.

38.

Yates

A.D.

, Achuthan

, Akanni

, Allen

, Alvarez-Jarreta

, Amode

M.R.

, Armean

I.M.

, Azov

A.G.

, Bennett

et al.

Ensembl 2020. Nucleic Acids Res.2020; 48:D682–D688.

39.

Kim

, Langmead

, Salzberg

S.L.

HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 2015; 12:357–360.

40.

, Liu

, Downie

, Liang

, Ji

, Li

Q.Q.

, Hunt

A.G.

Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation. Proc. Natl. Acad. Sci. U.S.A.2011; 108:12533–12538.

41.

Tian

, Hu

, Zhang

, Lutz

C.S.

A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res.2005; 33:201–212.

42.

Herrmann

C.J.

, Schmidt

, Kanitz

, Artimo

, Gruber

A.J.

, Zavolan

PolyASite 2.0: a consolidated atlas of polyadenylation sites from 3′ end sequencing. Nucleic Acids Res.2020; 48:D174–D179.

43.

Neve

, Patel

, Wang

, Louey

, Furger

A.M.

Cleavage and polyadenylation: ending the message expands gene regulation. RNA Biol.2017; 14:865–890.

44.

Bailey

T.L.

DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics. 2011; 27:1653–1659.

45.

Bailey

T.L.

, Williams

, Misleh

, Li

W.W.

MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res.2006; 34:W369–W373.

46.

Beaudoing

, Freier

, Wyatt

J.R.

, Claverie

J.M.

, Gautheret

Patterns of variant polyadenylation signal usage in human genes. Genome Res.2000; 10:1001–1010.

47.

Z.L.

, Fritz

E.R.

, Reecy

J.M.

AnimalQTLdb: a livestock QTL database tool set for positional QTL information mining and beyon3d. Nucleic Acids Res.2007; 35:D604–D609.

48.

Yang

, Yang

, Zhao

, Yang

, Wang

, Yang

, Niu

, Gong

Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation. Nucleic Acids Res.2020; 48:D659–D667.