Individual variations of white matter (WM) tracts are known to be associated with various cognitive and neuropsychiatric traits. Diffusion tensor imaging (DTI) and genome-wide single-nucleotide polymorphism (SNP) data from 17,706 UK Biobank participants offer the opportunity to identify novel genetic variants of WM tracts and explore the genetic overlap with other brain-related complex traits. We analyzed the genetic architecture of 110 tract-based DTI parameters, carried out genome-wide association studies (GWAS), and performed post-GWAS analyses, including association lookups, gene-based association analysis, functional gene mapping, and genetic correlation estimation. We found that DTI parameters are substantially heritable for all WM tracts (mean heritability 48.7%). We observed a highly polygenic architecture of genetic influence across the genome (p-value=1.67*10−05) as well as the enrichment of genetic effects for active SNPs annotated by central nervous system cells (p-value=8.95*10−12). GWAS identified 213 independent significant SNPs associated with 90 DTI parameters (696 SNP-level and 205 locus-level associations; p-value<4.5*10−10, adjusted for testing multiple phenotypes). Gene-based association study prioritized 112 significant genes, most of which are novel. More importantly, association lookups found that many of the novel SNPs and genes of DTI parameters have previously been implicated with cognitive and mental health traits. In conclusion, the present study identifies many new genetic variants at SNP, locus and gene levels for integrity of brain WM tracts and provides the overview of pleiotropy with cognitive and mental health traits.
Complex brain functions rely on dynamic interactions between distributed brain areas operating in large-scale networks. Consequently, the integrity of white matter connections between brain areas is critical to proper function. Microstructural differences in white matter (WM) tracts are phenotypically associated with information processing speed and intelligence 1–4 as well as neurodegenerative/neuropsychiatric traits, such as Alzheimer’s disease 5, Parkinson’s disease 6, schizophrenia (SCZ) 7, and attention-deficit/hyperactivity disorder (ADHD) 8. A better understanding of genetic factors influencing integrity of WM tracts could have important implication for understanding the etiology of these diseases as well as individual variation in intelligence. To reveal the underlying genetic contributions to brain structural development and disease/disorder processes, imaging genetics studies of WM microstructure has been an active research area over the past fifteen years. The structural changes of WM tracts are typically measured and quantified in diffusion tensor imaging (DTI) 9. Brain diffusivity can be influenced by many aspects of its micro- or macro-structures 10. To reconstruct the WM pathways and tissue microstructure, DTI models the diffusion properties of WM using random movement of water. Specifically, DTI quantifies diffusion magnetic resonance imaging (dMRI) in a tensor model and analyzes diffusions in all directions. A typical DTI diagonalizes the tensor and calculates three pairs of eigenvalues/eigenvectors that respectively represent one primary and two secondary diffusion directions. Within each voxel, several DTI parameters can be derived: fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD), radial diffusivity (RD), and mode of anisotropy (MO). As a summary measure of WM integrity 11, higher FA indicates stronger directionality in this voxel. MD quantifies the magnitude of absolute directionality, AD is the eigenvalue of the principal direction, RD is the average of the eigenvalues of the two secondary directions, and MO is the third moment of a tensor. Positive MO reflects narrow tubular water diffusion, whereas a negative value denotes planar water diffusion 12. There are several approaches to analyze DTI data across the whole brain, including manual region-of-interest (ROI) analysis, automated ROI analysis, voxel-based analysis, such as tract-based spatial statistics (TBSS) 13, as well as tractography and graph theory analysis; see Tamnes, Roalf 14 for a survey.
In family-based studies, the magnitude of genetic influences (i.e., heritability) in various DTI parameters of WM tracts, including FA, MD, AD, and RD, has been examined across a wide age range, from neonates 15, young children 16, older children 17, adolescents 18, and young adults 19 to middle aged 20 and older adults 21. Participants in these studies are typically monozygotic and dizygotic twins or family members. Table 1 of Vuoksimaa, Panizzon 20 lists 14 studies that illustrated that a substantial proportion of variance in DTI parameters (FA, MD, AD, and RD) was explained by additive genetic effects. However, the genetic architecture of DTI parameters remains largely unknown due to the limitation of family-based studies, for which the heritability estimation has relied on contrasting the phenotypic similarity between monozygotic and dizygotic twins. Genetic architecture denotes the characteristics of genetic variations that contribute to the broad-sense heritability of a phenotype 22. Based on the number of genetic variants contributing to phenotypic variance, genetic architecture can be described as monogenic (one variant), oligogenic (few variants), polygenic (many variants), or omnigenic, which hypothesizes that almost all genetic variants have small but non-zero genetic contributions 23, 24. Uncovering the genetic architecture and discovering the associated genetic variants are essential steps to delineate the functional mechanisms and understand the genetic overlap between white matter structures and neuropsychiatric traits.
Recent developments have enabled heritability estimation and genetic variants discovery with using the common single-nucleotide polymorphisms (SNPs) data collected in general populations. Instead of using the expected genetic correlation based on pedigree information, SNP heritability is estimated by adding up the genetic effects across a large number of common SNPs (minor allele frequency [MAF]>0.05 or 0.01) 25, 26. The architecture of genetic influences can be assessed by SNP annotation and partition 27, 28. Genome-wide association studies (GWAS) and post-GWAS analysis can further identify causal genetic variants at SNP, locus and gene levels 29, 30, and assess the genetic overlap of complex traits in different domains 31, 32. With these methods, the availability of genomic and imaging data from recent large population-based United Kingdom (UK) Biobank resource 33 offers the opportunity to uncover the genetic basis of brain WM tracts in one large-scale, relatively homogeneous population. The UK Biobank (UKB) has captured data from over 500,000 original participants of middle or elderly ages (age range 40–69), and is currently in the process of following up with 100,000 of these participants to perform brain MRI screening 34.
Rutten-Jacobs, Tozer 35 and Elliott, Sharp 36 performed GWAS for brain MRI phenotypes using the UKB brain imaging data released in 2017 (n~8500). Elliott, Sharp 36 showed the ubiquitous impact of genetics in various brain imaging measures, and Rutten-Jacobs, Tozer 35 focused on the DTI parameters and examined their genetic overlaps with stroke, depression, and dementia. However, the simple size in these GWAS was far from being sufficient, for which only a few novel loci were detected. Here we generated 110 tract-based DTI parameters using the British ancestry UKB sample including 17,706 participants. For each of the 110 phenotypes, we estimated the SNP-heritability, assessed the distribution of genetic effects by SNP annotation and partition, and carried out GWAS to identify the associated genetic variants at SNP and locus levels. In addition, we discovered gene-level associations via MAGMA 37, and explored the functional consequences of the significant SNPs by functional mapping and annotation analysis (FUMA 30). To detect genetic overlap and pleiotropy in WM tracts and other complex traits, we performed association lookups at SNP and gene levels on the NHGRI-EBI GWAS catalog 38 and estimated genetic correlations via linkage disequilibrium (LD) score regression (LDSC 32). As demonstrated later, hundreds of novel genetic associations were detected in the present GWAS and a much clearer picture of widespread pleiotropy with cognitive and mental health traits was found in our post-GWAS analysis. The UKB GWAS results were further validated in an independent imaging genetics dataset. The GWAS summary statistics have been made publicly available at https://med.sites.unc.edu/bigs2/data/gwas-summary-statistics/.
We used data from 17,706 UKB individuals of British ancestry (self-reported ethnic background, Data-Field 21000). The ancestry information was checked and confirmed by the top genetic principal components provided by UKB 39 (GPCs, Data-Field 22009) (Supplementary Figure 1). The dMRI data 34 and covariates were downloaded from the UKB data resource. We generated 110 DTI parameters: FA, AD, MD, MO and RD of 21 WM tracts, and their average values across these tracts. The WM tracts were labelled by the ENIGMA-DTI pipeline 40, 41, which was widely applied to measure the variation of microstructural integrity 42–44. The ID and full names of these 21 WM tracts are listed in Supplementary Table 1. A full description of the DTI preprocessing and analysis, imaging quality controls, white matter tracts, and formulas to calculate the DTI parameters are documented in supplementary information. An overview of the ENIGMA-DTI pipeline applied in this study is given in Supplementary Figure 2 and a few examples are shown in Supplementary Figures 3–6. We removed values greater than five times the median absolute deviation from the median in each continuous variable. All individuals were aged between 40 and 80 years and the proportion of females was 52.9%.
We downloaded the imputed SNP data from UKB data resource 39. We further performed the following SNP data quality controls using PLINK 45: excluding subjects with more than 10% missing genotypes, only including SNPs with MAF > 0.01, genotyping rate > 90%, and passing Hardy-Weinberg test (p-value>1*10−7). We also removed SNPs with imputation INFO score less than 0.8.
For each of the 110 DTI parameters, we estimated the proportion of variation explained by all autosomal SNPs with using univariate GCTA-GREML analysis 25. We considered the fixed effects of age (at imaging), age-squared, gender, age-gender interaction, age-squared-gender interaction, as well as the top 40 genetic principal components. We also estimated the proportion of variation explained by SNPs in each chromosome. In addition, we performed cell-type-specific SNP heritability analysis. SNPs were grouped according to their functional activeness in various cell groups 28 and specifically in the central nervous system (CNS) cell group: CNS_active, CNS_inactive, and Always_inactive, see supplementary information for detailed definitions. We performed GWAS for each DTI parameter separately with PLINK 45. The same set of covariates as in GCTA-GREML analysis were adjusted in GWAS and all other analyses unless stated otherwise.
We characterized genomic risk loci by using FUMA 30 online platform (v1.3.4). FUMA first identified independent significant SNPs, which were defined as significant SNPs that were independent of each other (R2<0.6). FUMA then constructed LD block for independent significant SNPs by tagging all SNPs that had a MAF ≥ 0.0005 and were in LD (R2≥0.6) with at least one of the independent significant SNPs. If LD blocks of independent significant SNPs were closed (<250 kb based on the closest boundary SNPs of LD blocks), they were merged to a single genomic locus. More details of FUMA analysis can be found in Watanabe, Taskesen 30. Independent significant SNPs and all SNPs in LD with them were subsequently searched on NHGRI-EBI GWAS catalog 38 (v2019–01-31) to look for reported associations (p-value<9*10−6) with any traits.
We carried out gene-based association analysis for 18,796 protein-coding candidate genes via MAGMA 37 (v1.07). Gene-based p-values were calculated by summarizing the GWAS results of corresponding SNPs, which were mapped to genes according to their physical positions. Significant genes were searched on NHGRI-EBI GWAS catalog 38 (v2019–01-31) to look for their previously reported associations with any traits. We focused on brain-related complex traits and characterized them into six groups: cognitive (e.g., general cognitive ability, cognitive performance, math ability, and intelligence), education (e.g., years of education and college completion), reaction time, neuroticism, neurodegenerative diseases (e.g., Alzheimer’s disease, Parkinson’s disease and corticobasal degeneration), and neuropsychiatric disorders (e.g., major depressive disorder [MDD], SCZ, bipolar disorder [BD], ADHD, alcohol use disorder, and autism spectrum disorder).
We also performed functional gene annotation and mapping via FUMA. SNPs were annotated with their biological functionality and then were linked to genes by a combination of positional, expression quantitative trait loci (eQTL) association, and 3D chromatin interaction mappings. Specifically, independent significant SNPs and all SNPs in LD with them were annotated for gene functional consequences by ANNOVAR 46. The annotated SNPs were mapped to 35,808 candidate genes based on physical position on the genome (tissue/cell types for 15-core chromatin state: brain), eQTL associations (tissue types: GTEx v7 brain 47, BRAINEAC 48, and CommonMind Consortium 49) and chromatin interaction mapping (built-in chromatin interaction data: dorsolateral prefrontal cortex, hippocampus 50; annotate enhancer/promoter regions: E053-E082 brain 51). We used default values for all other parameters in FUMA.
We used LDSC (v1.0.0, https://github.com/bulik/ldsc) to estimate the pairwise genetic correlation between DTI parameters and other traits by their GWAS summary statistics. In LDSC, we used the pre-calculated LD scores provided by LDSC (https://data.broadinstitute.org/alkesgroup/LDSCORE/), which were computed using 1000 Genomes European data. We used HapMap3 SNPs and removed all SNPs in chromosome 6 in the MHC region.
Genome-wide polygenic risk scores 52 were created to examine the out-of-sample prediction ability of the UKB GWAS results. Two procedures were used to adjust for the LD structure: 1) LD-based pruning (window size 50, step 5, R-squared = 0.2); and 2) posterior effect size estimation under continuous shrinkage prior with an external LD reference panel 53. We tried five p-value thresholds for predictor selection in each of the two procedures: 1, 0.5, 0.05, 5*10−4 and 5*10−8. Thus, ten polygenic scores were generated via PLINK and we reported the best prediction accuracy that can be achieved by a single score of these ten. Besides same-trait prediction, we also used cross-trait polygenic risk scores54 to validate the observed significant genetic correlations between DTI parameters and other brain-related traits. The association between polygenic score and phenotype was estimated and tested in linear regression model, adjusting for the effects of age and sex. The additional phenotypic variation that can be explained by polygenic score (i.e., the incremental R-squared) was used to measure the prediction accuracy.
Figures 1–2, Supplementary Figures 7–12, and Supplementary Video 1 display the SNP heritability of DTI parameters estimated by all common autosomal SNPs. The associated standard errors, raw and Bonferroni-adjusted p-values from the one-sided likelihood ratio tests are given in Supplementary Table 2. All SNP heritability estimates were significantly larger than zero (Bonferroni-adjusted p-value<0.004). Genetic factors accounted for a moderate or large portion of the variance of DTI parameters in all WM tracts (mean heritability 0.487, standard errors are around 0.041). For example, genetic effects explained more than 60% of the total variance of FA in the posterior limb of the internal capsule (PLIC), anterior corona radiata (ACR), superior longitudinal fasciculus (SLF), and cingulum cingulate gyrus (CGC). The lowest SNP heritability of FA across all WM tracts were found in fornix (FX, 37%) and corticospinal tract (CST, 27%). According to the functions of WM tracts (Connectopedia Knowledge Database, http://www.fmritools.com/kdb/white-matter/), we clustered them into four communities including complex fibers (C1: ACR, ALIC, PCR, PLIC, PTR, RLIC, SCR, EC, SS), associative fibers (C2: CGC, CGH, FX, FXST, IFO, SFO, SLF, UNC), commissural fibers (C3: BCC, GCC, SCC) and projection fibers (C4: CST) (Figure 1, see Supplementary Table 1 for IDs). We found that the set of WM tracts in C1 and C3 (mean=0.512) tended to have higher SNP heritability than those in C2 and C4 (mean=0.440, p-value=2.16*10−04).
To examine the distribution of SNP heritability across the genome, we partitioned SNP data into 22 chromosomes and estimated the SNP heritability by each chromosome (Supplementary Table 3). We found that the mean heritability across all 110 DTI parameters explained by each chromosome was linearly associated with the length of the chromosome (Figure 3(a), R2=61.2%, p-value=1.67*10−05). This finding reveals a highly polygenic or omnigenic genetic architecture 24 of WM tracts. The large number of SNPs that contribute to the variation in DTI parameters are widely spread across the whole genome. To further illustrate this architecture, we ordered and clustered the 22 chromosomes into three groups by their lengths: long, medium, and short. The long group had 4 chromosomes (CHRs 2, 1, 6, 3), which together accounted for 33% of the length of the whole genome; the medium group had 6 chromosomes (CHRs 4, 5, 7, 8, 10, 11), which accounted for another 33% of the length of the whole genome; and the short group consisted of the remaining 12 chromosomes. Figure 3(b) shows the SNP heritability estimates grouped by chromosomal length. It is clear that longer chromosomes tended to have higher SNP heritability estimates than medium (p-value=3.82*10−13) or shorter (p-value<2.20*10−16) ones for DTI parameters.
To compare the contribution of SNPs with different activity level, we partitioned the genetic variation according to CNS-cell-specific annotations: CNS_active, CNS_inactive, and Always_inactive (Supplementary Table 4). Heritability estimated by SNPs residing in chromatin regions inactive across all cell groups (Always_inactive) was clearly much smaller than the heritability estimated by SNPs residing in chromatin regions active in CNS cell (CNS_active, p-value<2.20*10−16). The heritability estimated by CNS_inactive SNPs (inactive in CNS cell but active in other cells) was also significantly smaller than that of CNS_active SNPs (p-value=8.95*10−12) (Figure 3(c)). This pattern remained consistent across all the five types of DTI parameters, though larger variance was observed for the MO parameters.
We carried out GWAS for the 110 DTI parameters with using 8,955,960 SNPs after genotyping quality controls. All Manhattan and QQ plots are shown in Supplementary Figure 13. 19,530 significant associations were detected at the 4.5*10−10 significance level (that is, 5*10−8/110, adjusted for testing multiple phenotypes) (Supplementary Figure 14, Supplementary Table 5). RD and MD of anterior limb of internal capsule (ALIC) had more than 3,000 significant associations. Significant SNPs were summarized into 213 independent significant SNPs by FUMA, which had 696 independent significant associations with 90 DTI parameters (Figure 4, Supplementary Tables 6–7). RD and FA of splenium of corpus callosum (SCC) had the largest number of independent significant SNPs. Of the 696 independent significant associations, 502 located in chromosome 5 (Supplementary Table 8, Supplementary Figure 15). The 696 independent significant SNP-level associations can be further characterized as 205 locus-level associations (Supplementary Table 9). FA and RD of SCC, FA and AD of FX, and RD of ALIC had at least five genetic risk loci (Supplementary Table 10). Each chromosome had at least one genetic risk locus except for chromosomes 13 ,20 and 21, and chromosome 5 had the largest number of risk loci (Supplementary Tables 11). Enrichment of GWAS signals in chromosome 5 for DTI parameters has been found in Rutten-Jacobs, Tozer 35, particularly in the chr5q14 locus. Further research is needed to explore the biological role of chromosome 5 for microstructural integrity changes that can be measured by dMRI. GWAS results at 5*10−9 and 5*10−8 significance levels are also reported in above tables and figures.
Association lookups on the NHGRI-EBI GWAS catalog 38 found that 122 of the 213 independent significant SNPs (associated with 83 DTI parameters) were reported to be associated with any traits (Supplementary Table 12). Our study replicated many SNPs reported in previous GWAS of WM hyperintensity measures and other brain structural measures (Supplementary Table 13), most of which were recently detected in Rutten-Jacobs, Tozer 35 (n=8,448). In addition, we tagged 15 different SNPs associated with neuropsychiatric disorders, 40 with cognitive traits, 12 with education, 47 with neuroticism, 17 with neurodegenerative diseases, and 2 with reaction time. We also compared our results with the those reported in Elliott, Sharp 36 (n=8,428) and found that 212 of the 368 significant associations (Supplementary Table 6 of Elliott, Sharp 36) were replicated in the present study.
Gene-based association analysis identified 508 significant gene-level associations (p-value<2*10−8, adjusted for testing multiple phenotypes) between 112 genes and 96 DTI parameters (Supplementary Table 14). Our results replicated genes discovered in Rutten-Jacobs, Tozer 35 and Elliott, Sharp 36, including VCAN, C16orf95, NBEAL1, SH3PXD2A, CACNB2, SRA1, GNA12, CPED1, and EPHA3, but most of the identified genes were not previously linked to DTI parameters. Association lookups found that 51 of the 112 significant genes were implicated with cognitive, education, reaction, neuroticism, neuropsychiatric and neurodegenerative traits in previous studies, such as CRHR1 55–58, MAPT 59–62, KANSL1 63–65, and MSRA 66–68 (Supplementary Table 15, Figure 5). We also annotated the SNPs by functional consequences on gene functions (Supplementary Figure 16) and performed functional gene mapping. Gene mapping discovered 292 genes (Supplementary Table 16), 218 of which were not detected in the gene-based association analysis.
We estimated the pairwise genetic correlation between 110 DTI parameters and 14 other complex traits (Supplementary Table 17). We focused on traits showing evidence of pleiotropy in association lookups. 43 pairs of phenotypes had significant genetic correlation after adjusting for multiple comparisons (1,540 tests) by using the Benjamini-Hochberg (B-H) procedure 69 at 0.05 level (Supplementary Table 18, Supplementary Figure 17). Reaction time had significant negative correlations with FA parameters (mean=−0.181), and had widespread positive correlations with AD, MO, MD and RD (mean=0.165) (Supplementary Figure 18). Education, cognitive, intelligence, and numerical reasoning also had positive genetic correlations with AD, FA, and MO. On the other hand, depression, MDD and drink frequency showed negative genetic correlations with FA. Other pairs were insignificant after multiple testing adjustment.
We also estimated the pairwise genetic correlation between 110 DTI parameters and 100 regional brain volume measures (ROIs, Supplementary Table 19). We found widespread genetic overlaps between DTI parameters and brain volumes (Supplementary Figures 19–23), and 490 pairs were significant after adjusting for multiple comparisons by using the B-H procedure at 0.05 level (11,000 tests). For example, white matter volume had significantly positive genetic correlations with FA of BCC, CGC, FX, FXST, and GCC WM tracts. All genetic correlation estimates and the associated p-values can be found in Supplementary Table 20.
To validate the UKB GWAS results, we repeated GWAS of 110 DTI parameters on data obtained from the Philadelphia Neurodevelopmental Cohort 70 (PNC) study (n=520). More details about PNC dataset and GWAS can be found in the supplementary information. Due to the small sample size, the probability of reaching GWAS significance level was low in the PNC data. Therefore, we focused on the 3,954,646 overlapped SNPs and checked whether the effect signs of top UKB GWAS SNPs were concordant in the two studies 71. For the 5,625 significant UKB associations (88 DTI parameters, 4.5*10−10 significance level), 85.4% (4,803) associations had the same effect signs in the two studies. In addition, 64 of the 88 DTI parameters have larger than 95% effect sign matching rate (Supplementary Table 21). We also assessed the prediction accuracy of UKB GWAS results on the PNC data with the genome-wide polygenic risk scores prediction 52. After adjusting for multiple comparisons by using the B-H procedure at 0.05 level, 104 of the 110 UKB-derived polygenic scores were significantly associated with the corresponding DTI parameter of the PNC dataset (Supplementary Table 22). The significant polygenic scores can account for up to 2.95% phenotypic variation, and the largest R-squared 2.95% was found in AD of ALIC (p-value=5.15*10−10). Other DTI parameters with R-squared larger than 2% included AD of FX (R-squared=2.36%, p-value=3.06*10−8), AD of SLF (R-squared=2.36%, p-value=7.48*10−9), average AD (R-squared=2.21%, p-value=5.89*10−10), MO of SLF (R-squared=2.12%, p-value=5.92*10−8), MO of ACR (R-squared=2.06%, p-value=2.51*10−7), and FA of CGH (R-squared=2.00%, p-value=1.22*10−7). In summary, the joint analysis with PNC datasets shows moderate to high level of agreement in term of GWAS effect signs, and indicates that the UKB GWAS summary statistics have widespread out-of-sample prediction power across WM tracts. We also constructed cross-trait polygenic risk scores 54, 72, 73 for PNC subjects to validate the genetic overlap between DTI parameters and brain-related behavioral traits. Of the 43 significant genetic correlation pairs observed in UKB LDSC analysis, 26 pairs were significant (p-value range=[4.16*10−11,2.67*10−2]) after adjusting for multiple comparisons by using the B-H procedure at 0.05 level (Supplementary Table 23). Particularly, reaction time-derived polygenic scores replicated significant genetic correlations with 18 DTI parameters, and depression and educational attainment-derived polygenic scores each validated 2 DTI parameters.
Heritability and GWAS analyses can provide guidance for downstream analyses to model the functional mechanisms and pathways involved in the phenotype of interest or its pleiotropy traits. A large number of family-based neuroimaging studies have documented that WM tracts are essentially heritable across the lifespan. Two recent GWAS 35, 36 have made attempts to explore the genetic risk variants of DTI parameters, however, they were less powered due to the limited sample size (n<9,000). Compared to the previous GWAS, the present study made novel contributions to 1) understand the genetic landscape of WM tract via chromosome-specific SNP heritability analysis; 2) identify novel genetic risk variants for many DTI parameters; 3) perform gene-based association analysis and conduct functional gene mapping with eQTL and chromatin interaction data; 4) uncover the statistical pleiotropy 31, 74 with other brain-related complex traits; and 5) examine the out-of-sample prediction ability of UKB GWAS results.
Our SNP heritability estimates are close to the ones reported in previous family-based studies (e.g., Table 1 of Vuoksimaa, Panizzon 20), and are also within a similar range as those reported in Elliott, Sharp 36, where the mean heritability is around 0.450. These results suggest that studies of DTI phenotypes using common SNPs may be more informative than studies focused on rare variants. Our results partitioning the genetic variation in chromosomes or SNP functional sets shed light on the distribution of genetic signals across the genome and different functional consequences. These findings suggest a highly polygenic genetic architecture of DTI parameters and also provide evidence for stronger genetic signals from SNPs in active chromatin regions, especially for those active in the CNS cell type. For such highly polygenic traits, large sample size is essential for GWAS to discover the widespread genetic signals. Our study with larger sample size identifies hundreds of new genetic associations at variant, locus, and gene levels. More importantly, these novel findings lead to uncover the widespread pleiotropy between DTI parameters and cognitive and mental health traits. Small but significant genetic correlations were quantified between DTI parameters and other brain-related complex traits. As the UKB releases more imaging data, it can be expected that better powered genetic studies on heritable WM tracts will continue facilitating gene exploration and helping understand the causal relationships of brain-related complex traits.
Our analyses reflect several methodological limitations of the current approaches on population-based imaging genetic studies. First, similar to previous studies 19, CST and FX were reported to have low SNP heritability, which may be due to the fact that such small, tubular tracts cannot be well registered and reliably resolved with current techniques 75. Second, heritability estimated by SNP data reflects narrow-sense heritability, which only considers the additive genetic effects of common variants. The genetic architecture may change as we broadly consider all genetic contributions (such as rare variants, non-additive effects and gene-gene interactions) in future studies. However, it is notable that with common SNPs in the UK Biobank, we have gained heritability estimates comparable to those reported in family-based studies. Finally, it is worth mentioning that the UKB data used in this study were sampled from a specific cohort (British ancestry) with a specific age-range. Since genetic ancestries are common confounding effects and aging can play an important role in brain WM structure changes, one should be careful to generalize these findings to general populations or to specific clinical cohorts. With more data from diverse imaging genetics studies, future research will be required to overcome these limitations and advance our biological understanding of the human brain.
This research was partially supported by U.S. NIH grants MH086633 and MH116527, and a grant from the Cancer Prevention Research Institute of Texas. We thank the individuals represented in the UK Biobank and the Philadelphia Neurodevelopmental Cohort (PNC) datasets for their participation and the research teams for their work in collecting, processing and disseminating these datasets for analysis. This research has been conducted using the UK Biobank resource (application number 22783), subject to a data transfer agreement. Ethics approval for the UK Biobank study was obtained from the North West Centre for Research Ethics Committee (11/NW/0382). For the PNC study, the institutional review boards of both the University of Pennsylvania and the Children’s Hospital of Philadelphia approved all study procedures. Informed consent was obtained from all subjects. We gratefully acknowledge all the studies and databases that made their GWAS summary data available. The authors acknowledge the Texas Advanced Computing Center (TACC, http://www.tacc.utexas.edu) at The University of Texas at Austin for providing HPC and storage resources that have contributed to the research results reported within this paper.
The authors declare no competing financial interests.
We made use of publicly available software and tools. All codes used to generate results that are reported in this paper are available upon request.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.


SNP heritability estimates grouped by white matter tract functions. The white matter tracts are clustered into four communities including complex fibers (C1), associative fibers (C2), commissural fibers (C3), and projection fibers (C4) according to the Connectopedia Knowledge Database, http://www.fmritools.com/kdb/white-matter/


Distribution of SNP heritability estimates of the 21 white matter tracts in brain.


Heritability estimated by SNPs in each chromosome or in functionally annotated SNP categories.


Number of independent significant SNPs discovered for each DTI parameter at different GWAS significance levels. Outer layer: p-value <5*10−8; middle layer: p-value <5*10−9; and inner layer: p-value <4.5*10−10.


Genes identified in gene-based association analysis of DTI parameters that have been implicated with traits of neuroticism, neurodegenerative diseases, neuropsychiatric disorders, education, cognitive, and reaction time in previous GWAS.