Nucleic Acids Research
Home Systematic evaluation of the effects of genetic variants on PIWI-interacting RNA expression across 33 cancer types
Systematic evaluation of the effects of genetic variants on PIWI-interacting RNA expression across 33 cancer types
Systematic evaluation of the effects of genetic variants on PIWI-interacting RNA expression across 33 cancer types

The authors wish it to be known that, in their opinion, the first three authors should be regarded as Joint First Authors.

Article Type: research-article Article History
Abstract

PIWI-interacting RNAs (piRNAs) are an emerging class of non-coding RNAs involved in tumorigenesis. Expression quantitative trait locus (eQTL) analysis has been demonstrated to help reveal the genetic mechanism of single nucleotide polymorphisms (SNPs) in cancer etiology. However, there are no databases that have been constructed to provide an eQTL analysis between SNPs and piRNA expression. In this study, we collected genotyping and piRNA expression data for 10 997 samples across 33 cancer types from The Cancer Genome Atlas (TCGA). Using linear regression cis-eQTL analysis with adjustment of appropriate covariates, we identified millions of SNP-piRNA pairs in tumor (76 924 831) and normal (24 431 061) tissues. Further, we performed differential expression and survival analyses, and linked the eQTLs to genome-wide association study (GWAS) data to comprehensively decipher the functional roles of identified cis-piRNA eQTLs. Finally, we developed a user-friendly database, piRNA-eQTL (http://njmu-edu.cn:3838/piRNA-eQTL/), to help users query, browse and download corresponding eQTL results. In summary, piRNA-eQTL could serve as an important resource to assist the research community in understanding the roles of genetic variants and piRNAs in the development of cancers.

Xin,Du,Jiang,Wu,Ben,Zheng,Chu,Li,Zhang,and Wang: Systematic evaluation of the effects of genetic variants on PIWI-interacting RNA expression across 33 cancer types

INTRODUCTION

PIWI-interacting RNAs (piRNAs), a class of small non-coding RNAs with 24–31 nucleotides, are mainly expressed in the mammalian germline and have vital functions, including repressing the activity of transposable elements by binding to PIWI proteins (1–3). In addition, several studies found that piRNAs also occur and function in human somatic tissues (4), and investigated aberrant piRNA expression in some cancer types, indicating the potential roles of piRNAs in the development of human cancers (5–8).

It is known that single nucleotide polymorphisms (SNPs), the most common type of germline variants, play vital roles in human diseases, including cancers (9). In the past decade, genome-wide association studies (GWASs) have identified multiple SNPs associated with human cancers (9,10). Previous studies have found that these cancer risk-associated SNPs may be involved in the development of cancers by influencing the expression levels of nearby genes (10). Therefore, expression quantitative trait locus (eQTL) analysis, a method for linking SNPs to gene expression, has been demonstrated to be a powerful approach to understanding the effects and molecular mechanism of functional SNPs (11). Currently, multiple eQTL databases have been constructed for evaluating the effects of SNPs on gene expression (e.g. Genotype-Tissue Expression project [GTEx] and PancanQTL), DNA methylation (e.g. Pancan-meQTL), alternative splicing (e.g. CancerSplicingQTL) and other quantitative phenotypes (12–15). However, there is no database that provides an eQTL analysis between SNPs and piRNA expression. Therefore, it is necessary to construct a piRNA-eQTL database to further understand the functional roles of SNP-piRNA pairs in the biological processes of tumorigenesis.

Previous studies have found that somatic mutations also played an important role in the development of cancer (16). In addition to mutations in protein-coding regions, these studies described the landscape of non-coding mutations in cancer, particularly in promoter and enhancer regions, and their role in regulating gene expression and protein functions (16,17). Therefore, it is noteworthy that somatic mutations would also affect gene expression extending in piRNAs (18). However, in this study, we primarily aimed to investigate the effects of germline variants on piRNA expression using The Cancer Genome Atlas (TCGA) program, and developed a user-friendly database for cis-piRNA eQTL analysis across 33 cancer types.

MATERIALS AND METHODS

Genotype data collection, imputation and processing

We obtained access to the raw genotype data from TCGA database (https://tcga-data.nci.nih.gov/tcga/), which included 906 600 SNPs using the Affymetrix SNP 6.0 array. We subsequently imputed the non-genotyped SNPs from normal blood or normal tissue samples based on the 1000 Genomes Project (Phase I, version 3, 1092 individuals) using IMPUTE2 (19). GTOOL was used to convert imputed data into the PLINK format with a threshold of 0.9. A series of filtering criteria for SNPs (including non-imputed and imputed genotypes) on autosomal chromosomes were then carried out as follows: (i) minor allele frequency (MAF) < 0.05; (ii) call rate < 95%; (iii) Hardy–Weinberg Equilibrium (HWE) P-value < 1 × 10−6 and (iv) imputation confidence score (info score) < 0.3 (Supplementary Materials; Figure 1).

Summary of the study design.
Figure 1.

Summary of the study design.

piRNA expression data collection and processing

Raw small RNA sequencing data were also obtained from TCGA database, and we recreated the raw FASTQ files based on the BAM file using bedtools2 (20). Subsequently, the FASTQ files were trimmed based on the criterion of ‘Phred quality score ≥ 20’ and ‘reads length ≥ 21 nucleotides’ to obtain high quality reads corresponding to piRNAs via FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) (6). All reads were then realigned to the reference genome (hg38) using STAR with custom piRNA reference transcriptome data from the piRBase database (version 2.0, http://www.regulatoryrna.org/database/piRNA/) (21), and the counts of each piRNA were summarized using featureCounts (22). In the piRBase database, all piRNA sequences were mapped to its latest genome using Bowtie software with parameter ‘-v 1 -a –best –strata’ in order to obtain the potential origin of every piRNA. piRNAs were referred to as gene- or repeat-related according to the overlapping of piRNA genome loci with RefSeq genes or repeat elements. Considering that several piRNAs may have different locations, we used counts per million mapped reads (CPM) and transcripts per million reads (TPM) to measure the total level (total number: 8 123 075) and transcript level (total number: 12 207 141) of each piRNA, respectively. Only the piRNAs with a CPM or TPM ≥ 1 and that were expressed in ≥ 20% of samples were retained and transformed using log2(x + 1) for further analyses.

Identification of cis-piRNA eQTLs

For each cancer type, we merged the genotype and piRNA expression data (transcript level) to perform eQTL analysis in tumor and normal tissues with more than 15 samples. The piRNA location (hg38) was transformed to match the location (hg19) of genotype data by LiftOver (https://genome.sph.umich.edu/wiki/LiftOver).

We performed the cis-piRNA eQTL analysis using R package Matrix eQTL in a linear regression model with the adjustment of sex, age, principal components (PCs, Supplementary Materials) and probabilistic estimation of expression residuals (PEER) factors (23). The top 5 PCs (extracted from genotype data using EIGENSOFT), and PEER factors (30 and 5 factors calculated using PEER from tumor and normal expression data, respectively) served as covariates in the model (24,25). Cis-eQTLs were defined if the SNP was within 1 Mb from the location of the piRNA. To retain more potential SNPs associated with piRNA expression, SNPs with P-value < 0.05 were defined as eSNPs (also known as eQTLs), and the corresponding piRNAs were defined as epiRNAs. Additionally, to control the type I error, we also used the false discovery rate (FDR) for multiple testing with the p.adjust function in R software.

Differential expression and survival analyses

Student's t-tests for independent (i.e. unpaired) and paired samples were performed to compare the expression of piRNAs (total level) between tumor and tumor-adjacent normal tissues. Furthermore, we carried out survival analysis to evaluate the associations of piRNAs (total level) and eSNPs with the overall survival probability in tumor samples. A log-rank test and Kaplan–Meier (KM) curve were used to examine the prognostic differences among different subgroups stratified by piRNA expression (high versus low level with different thresholds) or genotype (homozygous genotype AA versus heterozygous genotype AB and homozygous genotype BB).

Identification of GWAS-associated eSNPs

We included all GWAS-identified SNPs from the GWAS catalog (http://www.ebi.ac.uk/gwas/, August 2019) (26), and extended the GWAS-associated linkage disequilibrium (LD) SNPs based on the 1000 Genomes Utah Residents with Northern and Western European Ancestry (CEU) population using PLINK 1.90 (ld-window-kb 500 –ld-window-r2 0.5). GWAS-associated eSNPs were defined as the eSNPs overlapping with GWAS-identified SNPs and LD SNPs.

Pan-cancer analysis

We also designed a ‘Pan-cancer analysis’ page, where users can submit a batch of SNPs and/or piRNAs to (i) investigate the piRNA expression levels across 33 cancer types; (ii) display all significant SNP-piRNA pairs across 33 cancer types; (iii) identify other quantitative phenotypes associated with the eSNPs by combination of the PancanQTL, Pancan-meQTL and ncRNA-eQTL databases (13,14,27); and (iv) find published cancer-associated piRNAs by referring to the ‘cancer related data’ module of the piRBase database (21).

RESULTS

Summary of piRNA expression and genotype data

We collected 10 997 samples with small RNA sequencing data from 33 cancer types, and the sample size ranged from 5 in glioblastoma multiforme (GBM) to 1207 in breast invasive carcinoma (BRCA; Supplementary Table S1).

For the total level of piRNAs, there was an average of 19 430 piRNAs for each cancer type, ranging from 10 031 for kidney chromophobe (KICH) to 40 642 for acute myeloid leukemia (LAML; Table 1). Among these piRNAs, we identified an average of 11 608 (P for independent samples < 0.05) or 10 074 (P for paired samples < 0.05) differentially expressed (DE) piRNAs. In addition, an average of 2987 survival-associated piRNAs (P-value for log-rank test < 0.05) were identified, most of which were also differentially expressed in the majority of cancers.

Table 1.
Summary of the piRNA expression data in each cancer type
Cancer typeNo. of samplesNo. of piRNAsaNo. of DE piRNAsbNo. of survival-piRNAsc
TumorNormalIndependentPaired
ACC80020 185--5933
BLCA4091914 898992892531079
BRCA107810420 10217 84315 2581874
CESC307315 180659211372146
CHOL36915 78213 23511 540496
COAD444819 92814 43410 1772620
DLBC47021 343--261
ESCA1841317 93311 37410 9761317
GBM0521 070---
HNSC5234421 15915 40213 6031087
KICH662510 03181657530550
KIRC5167118 91218 37616 4756156
KIRP2913413 995955911 7091589
LAML*103040 642--7460
LGG512025 828--16 439
LIHC3725019 56815 99915 3586150
LUAD5134615 35412 47592705228
LUSC4784520 20218 98114 598531
MESO87014 294--5351
OV489027 004--2044
PAAD178410 840162116603563
PCPG179315 033911160612134
PRAD4945214 31613 69513 51390
READ161321 37914 7727247505
SARC259014 311--3401
SKCM98226 4281068-2010
STAD4364116 35813 30513 4651319
TGCT150025 543--506
THCA5065916 24810 77799961716
THYM124221 76351393293893
UCEC5383316 79313 53112 3952833
UCS57023 023--787
UVM80025 753--4508

aThe number of piRNAs in the total level.

bDifferentially expressed genes, P-value for Student's t-test < 0.05.

c P-value for log-rank test <0.05 based on the median value of piRNA expression.

*Primary blood derived cancer-peripheral blood.

For the transcript level of piRNAs, the piRNA expression data were merged with genotype data for the eQTL analysis in tumor and normal samples, respectively. The matched sample size is summarized in Table 2, ranging from 36 for cholangiocarcinoma (CHOL) to 989 for BRCA in tumor samples, and from 2 for skin cutaneous melanoma (SKCM) or thymoma (THYM) to 69 for kidney renal clear cell carcinoma (KIRC) in normal tissues. After quality control, an average of 3.6 million SNPs and 18 543 piRNAs were obtained in each cancer type.

Table 2.
Summary of epiRNAs and eSNPs for each cancer type in TCGA
Cancer typeNo. of samplesNo. of SNPsNo. of piRNAsaTumorbNormalb
TumorNormalTumorNormalPairsepiRNAseSNPsSurvival-eSNPscGWAS-eSNPsPairsepiRNAseSNPsSurvival-eSNPscGWAS- eSNPs
ACC7803 481 471-19 7822 525 25419 05326 6041 34611 666-----
BLCA407193 650 5793 219 23114 3151 977 10613 69130 3621 64512 5951 445 69913 69121 8999608 246
BRCA989463 517 7213 622 28519 6602 782 78818 79933 8591 70212 9012 552 73218 80122 5631 0699 085
CESC28733 684 885-14 7081 994 42714 00126 7561 1589 133-----
CHOL3693 400 798-15 2981 676 96114 76721 7396527 449-----
COAD41583 649 339-18 6902 577 00917 73736 1951 54512 839-----
DLBC4303 626 087-20 8982 615 69520 09127 9861 4669 845-----
ESCA181133 718 825-17 1842 098 67516 54228 4102 00310 594-----
GBM00-------------
HNSC507443 666 1673 732 54419 8662 509 60018 92735 6802 51513 6352 148 74818 85427 6541 4681 1236
KICH66253 649 1263 711 5329 3291 154 3648 75115 7599886 4261 074 1978 67516 6467797 066
KIRC512693 515 7463 690 13917 8852 513 85716 87431 8661 50811 3902 064 15217 03325 8841 13610 942
KIRP288343 671 4973 466 16813 1311 593 69712 70324 7879228 3291 789 07212 56918 2986207 580
LAML*10303 655 032-39 9345 012 06838 81156 1893 23521 664-----
LGG50703 646 049-25 1863 600 78924 00050 5782 44918 402-----
LIHC372503 556 5813 624 42219 0642 575 61418 31533 0452 84912 0772 363 85018 22230 7082 86611 503
LUAD509463 645 9173 709 19113 5111 731 61512 91927 3051 2059 6391 412 29212 99419 5398086 608
LUSC477453 597 9123 691 32618 8882 455 75118 04935 4081 81213 3102 485 95218 04930 7321 29411 034
MESO8703 608 334-14 1881 569 97213 61921 3181 0197 744-----
OV48203 675 726-25 2264 056 11824 21346 8253 25018 127-----
PAAD17243 646 425-10 6651 201 12510 17123 8469568 298-----
PCPG17933 605 369-13 8401 624 62513 44725 7571 2529 524-----
PRAD492523 640 6333 539 85913 8541 796 30913 40228 3361 06810 1381 487 46213 40221 5777558 432
READ14633 658 240-20 1472 602 27719 10735 8341 68714 072-----
SARC25703 564 336-13 0841 433 79812 46423 2331 3318 977-----
SKCM9823 649 256-25 4852 938 72024 20636 9262 26614 484-----
STAD411403 656 6103 961 24815 7971 887 47515 19031 1551 52011 9822 196 78215 19027 5071 0699 800
TGCT15003 680 238-24 5843 489 87523 67839 8631 43314 919-----
THCA501583 672 2893 483 00414 9972 141 81314 39235 9851 62813 6651 787 16314 42326 3371 42110 122
THYM12223 679 062-21 2402 649 80220 32933 5891 61012 151-----
UCEC519323 623 1093 491 65815 4572 181 79414 87532 0231 51010 9371 622 96014 83523 8769028 606
UCS5503 385 763-22 2302 391 16621 31827 7221 7939 757-----
UVM8003 769 109-25 2593 564 69224 24338 1241 86314 003-----

aThe number of piRNAs in the transcript level.

b P-value for eQTL analysis < 0.05.

c P-value for log-rank test < 0.05.

*Primary blood derived cancer-peripheral blood.

Identification and exploration of cis-piRNA eQTLs

For each cancer type, the average associations of 2.4 and 1.8 million cis SNP-piRNA pairs were identified in tumor and normal samples, respectively. For tumor tissues, we identified 76 924 831 significant pairs in 32 cancer types (P-value < 0.05), ranging from 1 154 364 pairs for KICH (with 8751 epiRNAs and 15 759 eSNPs) to 5 012 068 pairs for LAML (with 38 811 epiRNAs and 56 189 eSNPs; Table 2). For normal tissues, we identified 24 431 061 significant pairs in 13 cancer types (P-value < 0.05), ranging from 1 074 197 pairs for KICH (with 8675 epiRNAs and 16 646 eSNPs) to 2 552 732 pairs for BRCA (with 18 801 epiRNAs and 22 563 eSNPs; Table 2). By comparing tumor and normal SNP-piRNA pairs in 13 cancer types, we only identified an average of 117 681 shared pairs, indicating the distinct differences of cis-piRNA eQTL results in human tumor and normal tissues. In addition, to better control the type I error, we provided the P-value at FDR = 0.1 for each cancer in Supplementary Table S2, where we identified 1 660 892 significant SNP-piRNA pairs (FDR ≤ 0.1) in tumor tissues and 114 511 pairs (FDR ≤ 0.1) in normal tissues.

In addition, among these potential eSNPs identified in tumor tissues, we found 53 186 survival-associated eSNPs (ranging from 652 for CHOL to 3250 for ovarian serous cystadenocarcinoma [OV]) and 380 672 GWAS-associated eSNPs (ranging from 6426 for KICH to 21 664 for LAML) in different cancer types. For normal tissues, we identified 15 147 survival-associated eSNPs (ranging from 620 for kidney renal papillary cell carcinoma [KIRP] to 2866 for liver hepatocellular carcinoma [LIHC]) and 120 260 GWAS-associated eSNPs (ranging from 6608 for lung adenocarcinoma [LUAD] to 11 503 for LIHC) in different cancer types.

Web design and interface

Based on the above results, we constructed a user-friendly database, piRNA-eQTL (http://njmu-edu.cn:3838/piRNA-eQTL/, which can also be accessed at http://222.190.246.206:3838/piRNA-eQTL/), using R package Shiny. In this database, we designed four modules to display the results of cis-piRNA eQTLs (tumor and normal tissues), differential expression analysis, survival analysis and GWAS-related eQTLs (tumor and normal tissues), respectively (Figure 1). Users can browse each module simply by clicking the corresponding module. For example, the user can select a cancer type, and input an SNP ID and piRNA name of interest to search for corresponding results in the four modules (Figure 2A). In addition, we also designed a ‘Pan-cancer analysis’ page, where users can submit a batch of SNPs and/or piRNAs to perform pan-cancer analysis (Supplementary Figure S1). The ‘About’ page provides more details about the function of this database.

Overview of piRNA-eQTL database. (A) Advanced search box. (B) Example of eQTL boxplots on the ‘Cis-eQTLs’ page. (C) Example of differentially expressed boxplots on the ‘Differential expression analysis’ page. (D, E) Example of Kaplan–Meier (KM) plots on the ‘Survival analysis’ page.
Figure 2.

Overview of piRNA-eQTL database. (A) Advanced search box. (B) Example of eQTL boxplots on the ‘Cis-eQTLs’ page. (C) Example of differentially expressed boxplots on the ‘Differential expression analysis’ page. (DE) Example of Kaplan–Meier (KM) plots on the ‘Survival analysis’ page.

Data browsing and querying of the four modules

On the ‘cis-eQTLs (Tumor)’ or ‘cis-eQTLs (Normal)’ page, a table with the chromosome ID, SNP ID, SNP genomic position, SNP alleles, piRNA name, statistic, beta value (effect size of the SNP on piRNA expression) and eQTL P-value is displayed on this page. When the user selects a cancer type or enters a piRNA name or SNP ID, the table will be rebuilt to display the query results. Users can download the results of cis-piRNA eQTLs for each cancer type by clicking the ‘Download’ button. In addition, users can select one SNP-piRNA pair and click the ‘Plot’ button, and a vector diagram of the boxplot is provided to display the association between the SNP genotypes and piRNA expression. For example, our analysis showed that piR-hsa-1945036 expression in individuals carrying the rs8018979 genotype AA (i.e. GG) is significantly lower than that in individuals carrying the rs8018979 genotype AB (i.e. GA) or BB (i.e. AA) in bladder urothelial carcinoma (BLCA) tumor tissues (P = 0.024); however, there was a reverse eQTL association in BLCA normal tissues (P = 0.043; Figure 2B).

On the ‘Differential expression analysis’ page, the search boxes are designed for retrieving the specific cancer type and piRNA. A table with comparison type (independent and paired Student's t-test), piRNA name, mean CPM value in tumor tissues, mean CPM value in normal tissues, fold change, statistic and Student's t-test P-value is provided. In addition, two boxplot diagrams are used to display the difference in the piRNA expression between independent and paired tumor and normal samples. For example, the expression of piR-hsa-1945036 in tumor tissues was significantly higher than that in normal tissues for BLCA (P for independent samples = 7.03 × 10−6; P for paired samples = 7.70 × 10−4, Figure 2C).

On the ‘Survival analysis’ page, the search boxes are designed for retrieving the specific cancer type, piRNA or eSNP. For piRNA, a table with the piRNA name, median survival time (months) for high- and low-expressed groups, and log-rank P-value is provided. Users can also select a different threshold value (i.e., percentile) from the slider box to split patients into high- and low-expressed groups. For eSNP, a table with the SNP ID, median survival time (months) for patients with different genotypes, and log-rank P-value is provided. Additionally, two diagrams of KM plot are provided to display the associations of piRNA expression and SNP genotypes with the overall survival probability. For example, our analysis showed that BLCA patients with the eSNP rs7175451 AB (i.e. TA) or BB (i.e. AA) genotypes have shorter survival time than patients with rs7175451 AA (i.e. TT) genotype (P for log-rank test = 1.76 × 10−5, Figure 2D). Higher expression of piR-hsa-135916 was significantly associated with a worse prognosis of BLCA patients (P for log-rank test = 6.64 × 10−5, Figure 2E).

On the ‘GWAS-eQTLs (Tumor)’ or ‘GWAS-eQTLs (Normal)’ page, a table with the SNP information, regulated piRNA and related GWAS traits is displayed. Search boxes are designed for retrieving specific cancer types, SNPs and piRNAs. In addition, users can select a different LD threshold value from the slider box to explore more potential eSNPs associated with GWAS traits. For example, the BRCA-associated eSNP rs7175451 was in the LD region of rs7170930 (r2 = 0.855), which was a potential GWAS-identified SNP for response to cytadine analogues (cytosine arabinoside) (28).

Pan-cancer analysis

On the ‘Pan-cancer analysis’ page, we provided three modules, including ‘Pan-cancer piRNA expression profile’, ‘Summary of pan-cancer eQTL analysis (Tumor)’ and ‘Summary of pan-cancer eQTL analysis (Normal)’ (Supplementary Figure S1). In the ‘Pan-cancer piRNA expression profile’ module, the search boxes are designed for retrieving a batch of piRNAs. A table with the piRNA name, tissue type, mean CPM value in 33 cancer types and P-value for the ANOVA test is provided. In addition, two boxplot diagrams are used to display the piRNA expression level across 33 cancer types in tumor and normal samples, respectively. Furthermore, users can search cancer-specific cis-piRNA eQTL results by selecting a cancer type. For example, the expression profile of piR-has-317 is shown in Supplementary Figure S2A, which can help users better investigate the piR-has-317 expression level in 33 cancer types; additionally, users can search cancer-specific piR-hsa-317-associated eQTL results by selecting a cancer type (e.g. BLCA; Supplementary Figure S2B).

In the ‘Summary of pan-cancer eQTL analysis (Tumor) ’ or ‘Summary of pan-cancer eQTL analysis (Normal)’ modules, the search boxes are designed for retrieving a batch of SNPs and/or piRNAs to display the results of three sub-modules, including ‘Summary of pan-cancer eQTL results’, ‘eSNP-associated quantitative phenotypes’ and ‘Cancer-associated piRNAs’. For the ‘Summary of pan-cancer eQTL results’ module, a table with the cancer type, chromosome ID, SNP ID, SNP genomic position, SNP alleles, piRNA name, statistic, beta value (effect size of the SNP on piRNA expression) and eQTL P-value is displayed, and a boxplot diagram is used to display the significant eQTL pairs across 33 cancer types. For example, all significant SNP-piR-hsa-317 pairs in 33 cancer types are shown in Supplementary Figure S3.

For the ‘eSNP-associated quantitative phenotypes’ module, a table with the cancer type, chromosome ID, SNP ID, SNP genomic position, SNP alleles, piRNA name, statistic, beta value (effect size of the SNP on piRNA expression), eQTL P-value, phenotype source, SNP alleles for phenotype, phenotype type, phenotype name, beta value (effect size of the SNP on quantitative phenotypes) and eQTL P-value for the phenotype is displayed, and a boxplot diagram is used to display the number of quantitative phenotypes across 33 cancer types. As shown in Supplementary Figure S4, the eSNPs associated with piR-hsa-317 were associated with multiple quantitative phenotypes (including genes, lncRNAs and CpG sites) in 33 cancer types. For example, the rs10823260 C allele was associated with increased expression levels of piR-hsa-317 (beta = 0.030; P = 0.019) and STOX1 (beta = 0.24; P = 4.51 × 10−8) in BRCA tumor tissues.

For the ‘Cancer-associated piRNAs’ module, a table with the cancer type, chromosome ID, SNP ID, SNP genomic position, SNP alleles, piRNA name, statistic, beta value, eQTL P-value, piRNA associated cancer and related PubMed ID is displayed. For example, the rs10215854 A allele was associated with a decreased expression level of piR-hsa-29218 in BLCA tumor tissues; besides, piR-hsa-29218 was also reported to play a crucial role in the development of bladder cancer (7).

DISCUSSION

In this study, we systematically performed cis-eQTL, differential expression, survival and GWAS-eQTL analyses by combining piRNA expression and genotype data in 33 cancer types. Finally, we constructed a user-friendly database called piRNA-eQTL for users to query, browse and download corresponding results. Millions of tables and plots (e.g. boxplots for eQTL and differential expression analyses, and KM plots for survival analysis) are provided in this online database.

Compared to previous eQTL databases, our database has several strengths. First, this is the first eQTL database to systematically evaluate the effects of genetic variants on piRNA expression across 33 cancer types. Second, considering the distinct differences in eQTLs between tumor and normal tissues (29), we provide cis-piRNA eQTL results for both tumor and normal tissues, which can help users better identify cancer-specific eQTLs. Third, we also provide the ‘differential expression analysis’ and ‘survival analysis’ modules, which are useful to help understand the potential roles of piRNAs in the development of cancers. Fourth, we used the piRBase database as our reference data. Compared to previous piRNA reference databases (e.g. fRNAdb and piRNABank) (30,31), piRBase is the first database that systematically integrates various piRNA associated data to support piRNA functional analysis, and the numbers of piRNAs have been increased. In addition, a major limitation of this database is that the piRNA expression level may not be very accurate because these data are obtained from small RNA sequencing (miRNA-Seq) data, and small RNA-Seq data are not enriched for piRNAs neither enriched for any other class. Notably, piRNA-specific RNA-Seq data are needed for further studies in non-coding RNA areas. Additionally, the sample size of normal tissues is limited for some cancer types. In particular, eQTL results with sample sizes <100 should be interpreted with caution; therefore, in our future studies, we will update the piRNA-eQTL database to provide more accurate cis-piRNA eQTL results with sufficient sample size by incorporating other databases (e.g. Gene Expression Omnibus [GEO] dataset). Furthermore, given that previous studies have demonstrated the influence of somatic mutations (e.g. single-nucleotide variants [SNVs], small insertions and deletions, genomic rearrangements and structural variations) on gene expression (32), a systematic analysis between somatic mutations and piRNA expression needs to be further performed and incorporated into this database.

CONCLUSION

In summary, piRNA-eQTL is the first online database for providing cis-piRNA eQTL results by integrating genotype and piRNA expression data across 33 cancer types, and this database could serve as an important resource to assist the research community in understanding the roles of genetic variants and piRNAs in the development of human cancers.

DATA AVAILABILITY

The raw genotype and small RNA sequencing data have been deposited in The Cancer Genome Atlas (TCGA) program. All other relevant data are available on the piRNA-eQTL website.

ACKNOWLEDGEMENTS

We thank The Cancer Genome Atlas (TCGA) program for sharing raw small RNA sequencing and genotype data.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Natural Science Foundation of China [81822039, in part]; Priority Academic Program Development of Jiangsu Higher Education Institutions (Public Health and Preventive Medicine). Funding for open access charge: None.

Conflict of interest statement. None declared.

REFERENCES

1. 

Girard A., Sachidanandam R., Hannon G.J., Carmell M.A. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature. 2006; 442:199202.

2. 

Lau N.C., Seto A.G., Kim J., Kuramochi-Miyagawa S., Nakano T., Bartel D.P., Kingston R.E. Characterization of the piRNA complex from rat testes. Science. 2006; 313:363367.

3. 

Vagin V.V., Sigova A., Li C., Seitz H., Gvozdev V., Zamore P.D. A distinct small RNA pathway silences selfish genetic elements in the germline. Science. 2006; 313:320324.

4. 

Yan Z., Hu H.Y., Jiang X., Maierhofer V., Neb E., He L., Hu Y., Hu H., Li N., Chen W.et al. Widespread expression of piRNA-like molecules in somatic tissues. Nucleic Acids Res.2011; 39:65966607.

5. 

Muller S., Raulefs S., Bruns P., Afonso-Grunz F., Plotner A., Thermann R., Jager C., Schlitter A.M., Kong B., Regel I.et al. Next-generation sequencing reveals novel differentially regulated mRNAs, lncRNAs, miRNAs, sdRNAs and a piRNA in pancreatic cancer. Mol. Cancer. 2015; 14:94.

6. 

Mai D., Ding P., Tan L., Zhang J., Pan Z., Bai R., Li C., Li M., Zhou Y., Tan W.et al. PIWI-interacting RNA-54265 is oncogenic and a potential therapeutic target in colorectal adenocarcinoma. Theranostics. 2018; 8:52135230.

7. 

Chu H., Hui G., Yuan L., Shi D., Wang Y., Du M., Zhong D., Ma L., Tong N., Qin C.et al. Identification of novel piRNAs in bladder cancer. Cancer Lett.2015; 356:561567.

8. 

Liu Y., Dou M., Song X., Dong Y., Liu S., Liu H., Tao J., Li W., Yin X., Xu W. The emerging role of the piRNA/piwi complex in cancer. Mol. Cancer. 2019; 18:123.

9. 

Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 Years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet.2017; 101:522.

10. 

Sud A., Kinnersley B., Houlston R.S. Genome-wide association studies of cancer: current insights and future perspectives. Nat. Rev. Cancer. 2017; 17:692704.

11. 

Gilad Y., Rifkin S.A., Pritchard J.K. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet.2008; 24:408415.

12. 

The GTEx ConsortiumArdlie K.G., Deluca D.S., Segre A.V., Sullivan T.J., Young T.R., Gelfand E.T., Trowbridge C.A., Maller J.B., Tukiainen T.et al. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015; 348:648660.

13. 

Gong J., Mei S., Liu C., Xiang Y., Ye Y., Zhang Z., Feng J., Liu R., Diao L., Guo A.Y.et al. PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types. Nucleic Acids Res.2018; 46:D971D976.

14. 

Gong J., Wan H., Mei S., Ruan H., Zhang Z., Liu C., Guo A.Y., Diao L., Miao X., Han L. Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer. Nucleic Acids Res.2019; 47:D1066D1072.

15. 

Tian J., Wang Z., Mei S., Yang N., Yang Y., Ke J., Zhu Y., Gong Y., Zou D., Peng X.et al. CancerSplicingQTL: a database for genome-wide identification of splicing QTLs in human cancer. Nucleic Acids Res.2019; 47:D909D916.

16. 

Fredriksson N.J., Ny L., Nilsson J.A., Larsson E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet.2014; 46:12581263.

17. 

Weinhold N., Jacobsen A., Schultz N., Sander C., Lee W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet.2014; 46:11601165.

18. 

Calabrese C., Davidson N.R., Demircioglu D., Fonseca N.A., He Y., Kahles A., Lehmann K.V., Liu F., Shiraishi Y., Soulette C.M.et al. Genomic basis for RNA alterations in cancer. Nature. 2020; 578:129136.

19. 

Howie B., Fuchsberger C., Stephens M., Marchini J., Abecasis G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet.2012; 44:955959.

20. 

Quinlan A.R., Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841842.

21. 

Wang J., Zhang P., Lu Y., Li Y., Zheng Y., Kan Y., Chen R., He S. piRBase: a comprehensive database of piRNA sequences. Nucleic Acids Res.2019; 47:D175D180.

22. 

Liao Y., Smyth G.K., Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014; 30:923930.

23. 

Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012; 28:13531358.

24. 

Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet.2006; 38:904909.

25. 

Stegle O., Parts L., Piipari M., Winn J., Durbin R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc.2012; 7:500507.

26. 

Buniello A., MacArthur J., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E.et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res.2019; 47:D1005D1012.

27. 

Li J., Xue Y., Amin M.T., Yang Y., Yang J., Zhang W., Yang W., Niu X., Zhang H.Y., Gong J. ncRNA-eQTL: a database to systematically evaluate the effects of SNPs on non-coding RNA expression across cancer types. Nucleic Acids Res.2020; 48:D956D963.

28. 

Li L., Fridley B.L., Kalari K., Niu N., Jenkins G., Batzler A., Abo R.P., Schaid D., Wang L. Discovery of genetic biomarkers contributing to variation in drug response of cytidine analogues using human lymphoblastoid cell lines. BMC Genomics. 2014; 15:93.

29. 

Ongen H., Andersen C.L., Bramsen J.B., Oster B., Rasmussen M.H., Ferreira P.G., Sandoval J., Vidal E., Whiffin N., Planchon A.et al. Putative cis-regulatory drivers in colorectal cancer. Nature. 2014; 512:8790.

30. 

Mituyama T., Yamada K., Hattori E., Okida H., Ono Y., Terai G., Yoshizawa A., Komori T., Asai K. The functional RNA database 3.0: databases to support mining and annotation of functional RNAs. Nucleic Acids Res.2009; 37:D89D92.

31. 

Sai L.S., Agrawal S. piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic. Acids. Res.2008; 36:D173D177.

32. 

Ding J., McConechy M.K., Horlings H.M., Ha G., Chun C.F., Funnell T., Mullaly S.C., Reimand J., Bashashati A., Bader G.D.et al. Systematic analysis of somatic mutations impacting gene expression in 12 tumour types. Nat. Commun.2015; 6:8554.