Proceedings of the National Academy of Sciences of the United States of America

Home The evolution of skin pigmentation-associated variation in West Eurasia

The evolution of skin pigmentation-associated variation in West Eurasia

Dan Ju, Iain Mathieson

Edited by Anne C. Stone, Arizona State University, Tempe, AZ, and approved November 4, 2020 (received for review May 11, 2020)

Author contributions: D.J. and I.M. designed research; D.J. and I.M. performed research; I.M. contributed new reagents/analytic tools; D.J. analyzed data; and D.J. and I.M. wrote the paper.

https://doi.org/10.1073/pnas.2009227118, Volume: 118, Issue: 1, Pages: 1-8

Article Type: Research Article Article History

Publisher: National Academy of Sciences

Altmetric

Table of Contents

Results
Discussion
Methods
Supplementary Material
Code Availability.

Abstract

Some of the genes responsible for the evolution of light skin pigmentation in Europeans show signals of positive selection in present-day populations. Recently, genome-wide association studies have highlighted the highly polygenic nature of skin pigmentation. It is unclear whether selection has operated on all of these genetic variants or just a subset. By studying variation in over a thousand ancient genomes from West Eurasia covering 40,000 y, we are able to study both the aggregate behavior of pigmentation-associated variants and the evolutionary history of individual variants. We find that the evolution of light skin pigmentation in Europeans was driven by frequency changes in a relatively small fraction of the genetic variants that are associated with variation in the trait today.

Skin pigmentation is a classic example of a polygenic trait that has experienced directional selection in humans. Genome-wide association studies have identified well over a hundred pigmentation-associated loci, and genomic scans in present-day and ancient populations have identified selective sweeps for a small number of light pigmentation-associated alleles in Europeans. It is unclear whether selection has operated on all of the genetic variation associated with skin pigmentation as opposed to just a small number of large-effect variants. Here, we address this question using ancient DNA from 1,158 individuals from West Eurasia covering a period of 40,000 y combined with genome-wide association summary statistics from the UK Biobank. We find a robust signal of directional selection in ancient West Eurasians on 170 skin pigmentation-associated variants ascertained in the UK Biobank. However, we also show that this signal is driven by a limited number of large-effect variants. Consistent with this observation, we find that a polygenic selection test in present-day populations fails to detect selection with the full set of variants. Our data allow us to disentangle the effects of admixture and selection. Most notably, a large-effect variant at SLC24A5 was introduced to Western Europe by migrations of Neolithic farming populations but continued to be under selection post-admixture. This study shows that the response to selection for light skin pigmentation in West Eurasia was driven by a relatively small proportion of the variants that are associated with present-day phenotypic variation.

Keywords

evolution, ancient DNA, skin pigmentation, polygenic selection, complex traits

License: CC BY

Juand Mathieson: The evolution of skin pigmentation-associated variation in West Eurasia

Skin pigmentation exhibits a gradient of variation across human populations that tracks with latitude (1). This gradient is thought to reflect selection for lighter skin pigmentation at higher latitudes, as lower UVB exposure reduces vitamin D biosynthesis, which affects calcium homeostasis and immunity (2 3–4). Studies of present-day and ancient populations have revealed signatures of selection at skin pigmentation loci (5 6 7 8–9), and single-nucleotide polymorphisms (SNPs) associated with light skin pigmentation at some of these genes exhibit a signal of polygenic selection in Western Eurasians (10). However, this observation, the only documented signal of polygenic selection for skin pigmentation, is based on just four loci (SLC24A5, SLC45A2, TYR, and APBA2/OCA2).

Therefore, while the existence of selective sweeps at a handful of skin pigmentation loci is well established, the evidence for polygenic selection—a coordinated shift in allele frequencies across many trait-associated variants (11)—is less clear. Recently, genome-wide association studies (GWAS) of larger samples and more diverse populations (12 13 14–15) have emphasized the polygenic architecture of skin pigmentation. This raises the question of whether selection on skin pigmentation acted on all of this variation or was indeed driven by selective sweeps at a relatively small number of loci (11).

The impact of demographic transitions on the evolution of skin pigmentation also remains an open question. The Holocene history (∼12,000 y before present [BP]) of Europe was marked by waves of migration and admixture between three highly diverged populations: hunter-gatherers, Early Farmers, and Steppe ancestry populations (16, 17). Skin pigmentation-associated loci may have been selected independently or in parallel in one or more of these source populations or may instead only have been selected after admixture. The impact of ancient shifts in ancestry is difficult to resolve using present-day data, but using ancient DNA, we can separate the effects of ancestry and selection and identify which loci were selected in which populations.

Here, we use ancient DNA to track the evolution of loci that are associated with skin pigmentation in present-day Europeans. Although we cannot make predictions about the phenotypes of ancient individuals or populations, we can assess the extent to which they carried the same light pigmentation alleles that are present today. This allows us to identify which pigmentation-associated variants have changed in frequency due to positive selection and the timing of these selective events. We present a systematic survey of the evolution of European skin pigmentation-associated variation, tracking over a hundred loci over 40,000 y of human history—almost the entire range of modern human occupation of Europe.

Results

Skin Pigmentation-Associated SNPs Show a Signal of Positive Selection in Ancient West Eurasians.

We obtained skin pigmentation-associated SNPs from the UK Biobank GWAS for skin color released by the Neale Lab (14). We analyzed the evolution of these variants using two datasets of publicly available ancient DNA (Fig. 1 A–C and SI Appendix, Fig. S1). The first (“capture-shotgun”) consists of data from 1,158 individuals dating from 45,000–715 y BP genotyped at ∼1.2 million variants (7, 16, 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51–52). The second (“shotgun”) is a subset of 248 individuals with genome-wide shotgun sequence data (16, 18 19 20 21 22 23 24–25, 28, 32, 35, 38, 40 41 42 43–44, 47, 49, 51, 53 54 55 56 57 58 59–60). This smaller dataset allows us to capture variants that may not be well tagged by the genotyped SNPs in the capture-shotgun dataset. Weighting variants by their GWAS-estimated effect sizes, we show that the polygenic score—the weighted proportion of dark pigmentation alleles—decreased significantly over the past 40,000 y (Fig. 1 A and B, P < 1 × 10⁻⁴ based on a genomic null distribution and accounting for changes in ancestry; SI Appendix, Fig. S2). We recapitulate this significant decrease using a different set of UK Biobank GWAS summary statistics calculated using linear mixed models (61) (SI Appendix, Fig. S3) and identifying independent signals based on predefined approximately independent linkage disequilibrium (LD) blocks (62) (SI Appendix, Fig. S4), rather than LD clumping. The genetic scores of the oldest individuals in the dataset fall within the range of present-day West African populations, showing that Early Upper Paleolithic (∼50–20,000 y BP) individuals, such as Ust’Ishim, carried few of the light skin pigmentation alleles that are common in present-day Europe.

Fig. 1.

Genetic scores for skin pigmentation over time. The solid lines indicate fitted mean scores with gray 95% confidence intervals. Weighted scores based on UK Biobank SNPs and unweighted scores based on manually curated SNPs for samples in the (A and D) shotgun dataset, (B and E) capture-shotgun dataset, and (C and F) capture-shotgun dataset dated within the past 15,000 y. Scores of labeled samples are plotted but not included in regressions. Averages for 1000 Genomes superpopulations are plotted at time 0. Area of points scale with number of SNPs genotyped in the individual.

Over the past 15,000 y, which covers most of the data, the polygenic score decreased (P < 1 × 10⁻⁴) at a similar rate to the estimated rate over the 40,000-y period (5.27 × 10⁻⁵ per year vs. 3.47 × 10⁻⁵ per year; P = 0.04). Stratifying individuals by ancestry (hunter-gatherer, Early Farmer, and Steppe) using ADMIXTURE revealed baseline differences in genetic score across groups (Fig. 1C). Mesolithic hunter-gatherers carried fewer light pigmentation alleles than Early Farmer or Steppe ancestry populations. This difference (0.091 score units) was similar to the difference between Mesolithic hunter-gatherers and Early Upper Paleolithic populations (0.097 score units). We tested for evidence of change in score over time within groups while controlling for changes in ancestry, finding no significant change within hunter-gatherer or Early Farmer populations (P = 0.32; P = 0.29). Steppe ancestry populations, however, exhibited a significant decrease (P = 0.007), with a steeper slope than that of Early Farmer or hunter-gatherer (P = 0.004 and P = 0.08, respectively).

While the UK Biobank is well powered to detect variation that is segregating in present-day Europeans, it may not detect variants that are no longer segregating. We therefore manually curated a separate list of 18 SNPs identified as associated with skin pigmentation from seven studies of populations representing diverse ancestries (Dataset S1E). Using this set of SNPs to generate unweighted scores, we observed similar results as with the UK Biobank ascertained SNPs (Fig. 1 D–F). We find a significant decrease in genetic score over both 40,000 and 15,000 y in the capture-shotgun dataset (P < 1 × 10⁻⁴ for both), although the decrease in the shotgun dataset was not significant (P = 0.17), probably reflecting a lack of power from the small number of ancient samples and SNPs.

Ancestry and Selection Have Different Effects on the Histories of Each Variant.

Next, we investigated the evolution of each variant independently. We separated the effects of ancestry and selection in the data by regressing presence or absence of the alternate allele for each pigmentation-associated SNP on both date and ancestry, inferred using principal component analysis (PCA) (63). Significant effects of ancestry on frequency can be interpreted as changes in allele frequency that can be explained by changes in ancestry. Significant changes in frequency over time, after accounting for ancestry, can be interpreted as evidence for selection occurring during the analyzed time period.

The SNP rs16891982 at the SLC45A2 locus shows the strongest evidence for selection across analyses (Fig. 2), but we note this does not necessarily imply positive selection over the entire time transect. We observed little evidence for demographic transitions in driving the increase in light allele frequency over time for this SNP. In addition, we found robust signals of selection for the light allele of rs1126809 near TYR, rs12913832 and rs1635168 near HERC2, and rs7109255 near GRM5 (Fig. 2 B and C). In some cases—for example at rs6120849 near EDEM2 and rs7870409 near RALGPS1—light alleles decreased in frequency more than can be explained by changes in ancestry (Fig. 2 B and C). The dark allele at rs6120849 (EDEM2) is also significantly associated with several other traits including increased sitting and standing height in the UK Biobank, increased blood creatinine, and increased heel bone mineral density (64). The dark allele of rs7870409 (RALGPS1) is associated with increased arm and trunk fat percentage (65). These observations raise the possibility of pleiotropic effects constraining the evolution of these loci or of spurious pleiotropy due to uncorrected population stratification in the GWAS.

Fig. 2.

Plot of P values for ancestry and time for individual skin pigmentation SNPs. The dashed lines indicate fifth percentile of P values. UK Biobank SNPs using the shotgun dataset are plotted in A and capture-shotgun dataset in B and C. Manually curated SNPs using the shotgun dataset are plotted in D and capture-shotgun dataset in E and F. SNPs are colored according to whether the light allele is increasing over time (red) or decreasing (blue), with saturation determined by the magnitude of change.

On the other hand, the changes in frequencies of several variants appear to have been driven largely by changes in ancestry. For example, rs4778123 near OCA2, rs2153271 near BNC2, and rs3758833 near CTSC show evidence that their changes in frequency were consistent with changes in genome-wide ancestry (Fig. 2 B and C). At SLC24A5, rs2675345 shows evidence of change with both ancestry and time, suggesting that even after the spread of the light allele from Anatolia into Western Europe in the Neolithic (7) selection continued to occur post-admixture. Again, not all of these cases involve an increase in the frequency of the light pigmentation allele over time. The light allele of rs4778123 (OCA2) was at high frequency in hunter-gatherers but lower in later populations (SI Appendix, Fig. S6B). From the manually curated set of SNPs (Fig. 2 D–F), rs12203592 near IRF4 also displays a marked effect of ancestry with higher light allele frequency in hunter-gatherers (SI Appendix, Fig. S6E). While rs12203592 was not present in the UK Biobank summary statistics, another SNP at the IRF4 locus, rs3778607, was present but with a smaller ancestry effect (SI Appendix, Fig. S7).

Lack of PBS Selection Signal across Pigmentation SNPs in Source Populations of Europeans.

As we described in the previous section, the frequencies of several skin pigmentation SNPs appear to have been driven by changes in ancestry. This observation suggests that their frequencies have diverged between the European source populations and then subsequently been driven by admixture. Frequency differences across the source populations might reflect the action of either genetic drift or selection. To examine this question, we looked for evidence of selection at skin pigmentation SNPs in each ancestral population using the population branch statistic (PBS) test (66). We ran ADMIXTURE with K = 3 on the capture-shotgun dataset (SI Appendix, Fig. S8) and used the inferred source population allele frequencies to calculate PBS for each source in 20-SNP nonoverlapping windows, with the 1000 Genomes CHB (Han Chinese) and YRI (Yoruba) populations as outgroups.

Few of the skin pigmentation loci show extreme PBS values (Fig. 3), and even fewer show evidence of divergent frequencies across ancient groups (SI Appendix, Fig. S10). On the Early Farmer and Steppe ancestry branches, but not the hunter-gatherer branch, the SLC24A5 locus exhibited the strongest signal (Fig. 3 B and C), indicating selection at the locus in both Early Farmer and Steppe source populations. The RABGAP1 locus also exhibited an elevated signal of selection in Early Farmer and Steppe, which remained when using a 40-SNP window size (SI Appendix, Fig. S11 B and C). In all three ancestry groups, we observed an elevated PBS at the OCA2 locus, with the most extreme value on the hunter-gatherer branch. However, unlike the SLC24A5 variant, neither of these SNPs (rs644490 and rs9920172) showed a substantial effect of ancestry from the individual SNP regression analysis. As a whole, the PBS values of the 170 skin pigmentation SNPs do not significantly deviate from the genome-wide distribution of PBS for any of the ancient source populations (SI Appendix, Fig. S12).

Fig. 3.

Joint PBS distributions for 20-SNP windows across the genome for GBR-CHB-YRI on the x axis and X-CHB-YRI on the y axis, with X being (A) hunter-gatherer, (B) Early Farmer, and (C) Steppe. The boxes represent windows centered around UK Biobank skin pigmentation SNPs and are colored redder according to how much higher the light allele frequency is in X compared to YRI. Nearest genes are labeled. The orange and red lines represent the top 1 and 0.1 percentiles.

Signals of Selection Are Restricted to the SNPs with the Largest Effect Sizes.

Since our results suggest that pigmentation-associated variants exhibit different changes in allele frequency, we further examined the signal of selection found in Fig. 1. In general, SNPs with larger GWAS effect sizes changed more in frequency over time than those with small effect sizes (P = 2.8 × 10⁻¹⁸; adjusted R² = 0.36) (SI Appendix, Fig. S13A). To understand the contribution of these large GWAS effect size SNPs, we iteratively removed SNPs from the polygenic score calculations (Fig. 4A). Removing the SNP rs16891982 at SLC45A2, we still found a significant decrease of score over time (P < 1 × 10⁻⁴), but the estimated rate of change (β_time) decreased by 39%. Removing the top two SNPs at the SLC45A2 and SLC24A5 locus further attenuated the signal with a decrease in β_time of 58% (P = 3.96 × 10⁻⁴). We removed the five SNPs with the largest GWAS effect sizes and found the effect of time attenuated (P = 0.026) with a decrease in β_time of 78%. Removing the top 10 SNPs attenuated the signal even more (P = 0.050) with an 81% decrease and removal of 15 or 20 SNPs abolished the signal (P = 0.147, P = 0.52).

Fig. 4.

(A) Regressions of genetic score based on UK Biobank SNPs using the capture-shotgun dataset over date, with scores using all SNPs and iteratively removing top GWAS effect size SNPs. (B) Q_x empirical −log₁₀(P values) using UK Biobank skin pigmentation-associated variants with all 1000 Genomes populations. Different numbers of SNPs were used to calculate Q_x, which were ordered by GWAS-estimated effect size. (C) Variance explained by UK Biobank SNPs with nearest gene labeled between UK and YRI populations (y axis) and within the UK population (x axis).

Using present-day populations from the 1000 Genomes Project (67), we tested for polygenic selection with the Q_x statistic (10), which measures overdispersion of trait-associated SNPs relative to the genome-wide expectation. We performed the test with different numbers of SNPs ordered by largest to smallest absolute GWAS effect size (Fig. 4B). Using all 170 SNPs in the test, we failed to observe a signal of polygenic selection (P = 0.27). However, when we restricted to the top five largest GWAS effect size SNPs, we found a significant signal (P < 2.5 × 10⁻⁷). We observed significant Q_x statistics at P < 0.05 up to the top 30 SNPs, but it appears this signal is driven by the top SNPs of largest effect. After removing the top 5 SNPs and testing the remaining 25 top SNPs, the signal of polygenic selection disappeared (P = 0.59). Examining the predicted variance explained per SNP (Fig. 4C and SI Appendix, Fig. S14), we predict that, of the UK Biobank SNPs, two SNPs at SLC45A2 and SLC24A5 make the largest contribution to the difference between West African and European populations. Most UK Biobank skin pigmentation-associated SNPs are predicted to contribute relatively little to the between-population variance.

Discussion

Large whole-genome ancient DNA datasets have, in the past few years, allowed us to track the evolution of variation associated with both simple and complex traits (68, 69). The majority of samples are from Western Eurasia, and we focus on that region, noting that a parallel process of selection for light skin pigmentation has acted in East Asian populations (9, 70). In West Eurasia, selection for skin pigmentation appears to have been dominated by a small number of selective sweeps at large-effect variants. There are a number of possible explanations for this. Many of the variants detected by the UK Biobank GWAS may have such small effects that they were effectively neutral or experienced such weak selection that we cannot detect it. Other variants may not have responded to selection because of pleiotropic constraint. The five SNPs with the largest effect sizes in UK Biobank do have some have small but significant associations with anthropometric phenotypes (for example, rs1805007 at MC1R with standing height), but by far their most significant associations are with hair color, facial aging, tanning, melanoma risk, and other phenotypes that are likely related to pigmentation (64). Finally, GWAS effect size estimates in European populations today may not reflect the impact these variants had in the past due to epistatic or gene–environment interactions. This would not affect the individual SNP time series and PBS analyses since they do not rely on effect sizes as weights but would reduce power for the weighted polygenic score and Q_x analyses.

Our analyses centered on 170 skin pigmentation-associated SNPs that are present on the 1240K capture array. To check for possible bias introduced by the capture array, we checked how many of the 242 SNPs used in the shotgun analysis were well tagged by the capture array. Of the 242 SNPs, 33 are shared, a further 67 are tagged at r² ≥ 0.8, and a further 19 SNPs are tagged at r² ≥ 0.5. However, while the 142 SNPs that are not tagged at r² ≥ 0.8 by the capture array do contain some moderate effect size SNPs, they do not contribute to the selection signal (P = 0.31) (SI Appendix, Fig. S15D). Therefore, although the capture array does not capture all of the variation contributing to present-day skin pigmentation variation, it does capture the vast majority of the variation that contributed to the evolution of the phenotype, justifying the use of the capture-shotgun dataset for detailed investigation of the evolutionary trends.

We find little evidence of parallel selection on independent haplotypes at skin pigmentation loci, suggesting that that differences in allele frequency across ancestry groups were mostly due to genetic drift. One exception is that the light allele at SLC24A5 was nearly fixed in both Early Farmer and Steppe ancestry populations due to selection. However, even for this variant we observe a signal of ongoing selection in our data even after admixture with hunter-gatherers, indicating continued selection after admixture. This is analogous to the rapid selection at the same locus for the light allele introduced via admixture into the KhoeSan, who now occupy southern Africa (12, 71).

We are also able to test previous claims about selection on particular pigmentation genes. We find no evidence of positive selection in Europeans at the MC1R locus in contrast to previous reports (72, 73). Among UK Biobank SNPs, rs1805007 near MC1R explains a relatively large amount of variation within the UK but is predicted to explain relatively little of the variation between Europe and West Africa (Fig. 4C). The TYRP1 locus has been previously identified as a target of selection in Europeans (5, 8, 9, 74, 75), although some studies (76, 77) have questioned this finding. Our analysis shows some support for selection at this locus, with a 20-SNP window centered around rs1325132 being in the top 1.1% of genome-wide PBS windows on the hunter-gatherer lineage. Some studies (76, 78) detect a signal of recent selection in Europeans at the KITLG SNP rs12821256 that is functionally associated with blond hair color (79). We find no evidence of selection on the West Eurasian lineage at this SNP in our time series analysis, but this may reflect a lack of power to detect a relatively small change in frequency. Finally, at the OCA2 locus, we recapitulate an observation of independent selection in Europeans and East Asians (70, 80) (SI Appendix, Fig. S9). In the ancient populations, we found an elevated PBS signal around rs9920172 that was most prominent in hunter-gatherers but also elevated in Early Farmer and Steppe ancestry populations (Fig. 3), suggesting a relatively early episode of selection in West Eurasians.

Relatively dark skin pigmentation in Early Upper Paleolithic Europe would be consistent with those populations being relatively poorly adapted to high-latitude conditions as a result of having recently migrated from lower latitudes. On the other hand, although we have shown that these populations carried few of the light pigmentation alleles that are segregating in present-day Europe, they may have carried different alleles that we cannot now detect. As an extreme example, Neanderthals and the Altai Denisovan individual show genetic scores that are in a similar range to Early Upper Paleolithic individuals (SI Appendix, Table S1), but it is highly plausible that these populations, who lived at high latitudes for hundreds of thousands of years, would have adapted independently to low UV levels. For this reason, we cannot confidently make statements about the skin pigmentation of ancient populations.

Our study focused, for reasons of data availability, on the history of skin pigmentation evolution in West Eurasia. However, there is strong evidence that a parallel trend of adaptation to low UVB conditions occurred in East Asia (15, 70, 81, 82). Less is known about the loci that have been under selection in East Asia, aside from some variants at OCA2 (80 81–82). Similarly, multiple studies have documented selection for both lighter and darker skin pigmentation in parts of Africa (12, 13, 83). Future work should test whether the process of adaptation in other parts of the world was similar to that in Europe. The lack of known skin pigmentation loci in scans of positive selection in East Asian populations (84, 85) raises the possibility that selection may have been more polygenic in East Asians than in Europeans. Finally, the evidence for polygenic directional selection on other complex traits in humans is inconclusive (86, 87). We suggest that detailed studies of other phenotypes using ancient DNA can be helpful at more generally identifying the types of processes that are important in human evolution.

Methods

Capture-Shotgun Ancient DNA Dataset.

We downloaded version 37.2 of the publicly available datasets of 2,107 ancient and 5,637 present-day (Human Origins dataset) samples from the Reich Lab website: https://reich.hms.harvard.edu/downloadable-genotypes-present-day-and-ancient-dna-data-compiled-published-papers. Ancient individuals were treated as pseudohaploid (i.e., carry only one allele) because most samples are low coverage. For many of these samples, 1,233,013 sites were genotyped using an in-solution capture method (88), while those with whole-genome shotgun sequence data were genotyped at the same set of sites. For our regression models, we included ancient individuals from West Eurasia, defined here as the region west of the Ural Mountains (longitude, <60° E) and north of 35° N (SI Appendix, Fig. S1). We excluded samples with less than 0.1× coverage. We removed duplicate and closely related samples (e.g., first-degree relatives), by calculating pairwise identity by state (IBS) between ancient individuals. For each individual, we identified a corresponding individual with the highest IBS. Based on the distribution of highest IBS values, we identified the pairs with IBS Z score >7 as duplicates and retained the individual with fewest missing sites. We also manually removed related samples that were explicitly annotated, selecting the highest coverage representative. Finally, we incorporated additional ancient samples from the Iberian Peninsula from two recent papers (36, 45), applying the same criteria for coverage and relatedness. Ultimately, we compiled a list of 1,158 ancient pseudohaploid individuals in our analyses covering a time span of the last 40,000 y. Only 11 individuals were dated to >15,000 y BP, and 1,147 individuals lived during the last 15,000 y.

Shotgun Ancient DNA Dataset.

We organized a dataset of samples that were shotgun sequenced without enrichment using the capture array (23, 25 26 27 28 29 30 31–32, 35, 39, 42, 45, 47 48 49 50–51, 54, 56, 58, 60 61 62 63 64 65 66–67). To generate the first 10 PCs from sites on the 1240K capture array, we combined shotgun samples with available genotype data from v37.2 of the Reich laboratory dataset and 15 individuals that we ourselves pulled down from BAM files. We manually removed duplicate samples in the shotgun dataset, preserving the sample with higher coverage. We checked for first-degree relatives in the dataset by considering familial annotations but found none. The final shotgun dataset included 249 individuals.

Skin Pigmentation SNP Curation.

We obtained summary statistics for UK Biobank GWAS for skin color (data field 1717) from the publicly available release by the Neale Lab (version 3, Manifest Release 20180731) (14). The GWAS measured self-reported skin color as a categorical variable (very fair, fair, light olive, dark olive, brown, black). To identify genome-wide significant and independent SNPs, we performed clumping using PLINK v1.90b6.6 (89) with 1000 Genomes GBR as an LD reference panel (--clump-p1 5 × 10⁻⁸ --clump-r2 0.05 --clump-kb 250), and followed up with clumping based on physical distance to exclude SNPs within 100 kb of each other. We made two separate lists of UK Biobank SNPs for the shotgun and capture-shotgun datasets because the capture-shotgun dataset was restricted to the 1240K array sites. For the capture-shotgun dataset, we intersected all UK Biobank SNPs with 1240K array SNPs before identifying 170 independent and genome-wide significant SNPs (Dataset S1A), which were used in Figs. 1 B and C, 2 B and C, 3, and 4. For the shotgun dataset, we identified 242 independent and genome-wide significant SNPs (Dataset S1B), which were used in Figs. 1A and 2A. For SI Appendix, Fig. S4, we identified 93 SNPs from the Neale Lab GWAS (Dataset S1F) by picking the top SNPs that are genome-wide significant from nearly independent LD blocks in the human genome specified beforehand (62). For SI Appendix, Fig. S3, we identified 176 SNPs (Dataset S1G) using the same clumping parameters with UK Biobank GWAS summary statistics calculated with fastGWA (61).

We also manually curated a list of skin pigmentation-associated SNPs from the literature (Dataset S1C). We identified 12 suitable studies (12, 13, 15, 80, 90 91 92 93 94 95 96–97), 9 of which were GWAS conducted in diverse populations (Dataset S1D). We did not include SNPs from all of the considered papers for our analysis because some studies reported associations that were not genome-wide significant (P < 5 × 10⁻⁸). However, we included subsignificant SNPs from a GWAS in Europeans (95) with evidence of replication (P < 5 × 10⁻⁸) in the UK Biobank GWAS. For a GWAS in African populations (12), we selected a SNP for each LD block, which was defined by the study authors, with the greatest support based on multiple lines of evidence (e.g., P value and functional annotations). For a GWAS in South Asians (94), we picked SNPs that were 200 kb away from each other and based the selection of SNPs that were in physical proximity by considering P value. The other studies presented a set of independent SNPs that we considered for inclusion. Lastly, we manually clumped the SNPs curated across studies, selecting SNPs 200 kb apart with more statistical and/or functional evidence than nearby SNPs. Specific details about the choices made on which SNPs to exclude can be found in the “Notes” column for highlighted SNPs in Dataset S1C. We ended with a final list of 18 SNPs for the manually curated set (Dataset S1E), 10 of which are present on the 1240K capture array. We also made pseudohaploid calls for each of the 18 SNPs in the shotgun dataset, picking a random allele from the reads at each site as for the overall dataset.

ADMIXTURE Analysis on Capture-Shotgun Data.

We performed unsupervised ADMIXTURE (98) on the dataset of ancient individuals with K = 3, which we found to produce the lowest cross-validation error for 2 < K < 9. The three identified clusters could easily be identified as corresponding to hunter-gatherer, Early Farmer, and Steppe (also referred to as Yamnaya) ancestry (SI Appendix, Fig. S8).

Time Series Analysis of Genetic Scores.

We calculated genetic scores for each individual from present-day 1000 Genomes populations (67) and ancient individuals. We computed the scores in two ways. First, we weighted by the GWAS-estimated effect sizes. Because of variable coverage across ancient samples, not all SNPs were present in a given sample. To account for missing information in the creation of weighted scores, we devised a weighted proportion in which we divided the realized score over the maximum possible score given the SNPs present in the sample: $S c o r e_{w e i g h t e d} = \sum_{i = 1}^{m} d_{i} β_{i} / \sum_{i = 1}^{m} β_{i}$ , where m is the number of skin pigmentation SNPs genotyped in the individual, d_i is the presence of the dark allele at the ith SNP, and β_i is the GWAS-estimated effect size (Fig. 1 A–C). For the manually curated list of SNPs where we did not have comparable effect size estimates, we computed an unweighted score, effectively assuming that all variants had the same effect size. This score is the proportion of dark alleles an individual carries out of the SNPs used in the construction of the score: $S c o r e_{u n w e i g h t e d} = (1 / m) \sum_{i = 1}^{m} d_{i}$ , where m is the number of SNPs genotyped in the individual and d_i is the presence of the dark allele at the ith SNP (Fig. 1 D–F).

We used logistic regression to examine the association between weighted and unweighted genetic score and time for all ancient samples. We fitted separate models for ancient samples in the shotgun and capture-shotgun datasets. We included ancestry as a covariate in the model by including the first 10 PCs. PCA was performed using smartpca v16000 (63) to generate PCs from present day populations and we projected ancient individuals onto these axes of variation. For the shotgun dataset and the capture-shotgun dataset that included all samples (40,000 y BP), we used 1000 Genomes samples. For the capture-shotgun dataset that restricted to samples later than 15,000 BP, we used West Eurasians from the Human Origins dataset (16).

We model the score of each individual as the proportion of successes in a binomial sample of size m_i. That is, $m_{i} S_{i} \sim B i n o m i a l (m_{i}, p_{i})$ , where, for individual i, m_i is the number of SNPs genotyped, S_i is the (weighted or unweighted) score, the probability of success p_i is given by the following:

log (\frac{p_{i}}{1 - p_{i}}) = α + β_{d a t e} d a t e_{i} + β_{l a t} l a t_{i} + β_{l o n} l o n_{i} + β_{P C 1} P C 1_{i} + \dots + β_{P C 10} P C 10_{i},

and date_i, lat_i, and lon_i are the dates, latitude, and longitude of each individual. For the weighted score, m_iS_i is noninteger, but the likelihood can be computed in the same way.

For the stratified analysis, we divided the capture-shotgun dataset into mutually exclusive ancestry groups. We categorized individuals for this analysis as hunter-gatherer if they carried over 60% of the ADMIXTURE-estimated hunter-gatherer component. For placement into the Early Farmer group, we required that an individual carry over 60% of that component. Individuals we categorized as Steppe had over 30% of the ADMIXTURE component and were dated to less than 5,000 y BP. Individuals that are more recent than 5,000 y BP and have at least 30% of the Steppe ADMIXTURE component were classified as Steppe even if they had more than 60% of the Early Farmer component. Based on these cutoffs, there were 102 hunter-gatherer, 499 Early Farmer, and 478 Steppe ancestry assigned individuals. We used the same logistic regression model as above for the stratified analysis to control for ancestry.

Because the fitted model parameters are overdispersed (SI Appendix, Fig. S2), we computed P values from a genome-wide empirical null distribution for β_date. We made this null distribution by running the regression model described above on scores from sets of random, frequency-matched (±1%) SNPs across the genome, maintaining the same effect sizes for the weighted score. We matched the derived allele frequency based on the EUR superpopulation frequencies from 1000 Genomes. We reported P values based on 10,000 random samples.

Time Series Analysis of Individual SNP Allele Frequencies.

We performed logistic regression for each individual SNP using date of the sample and ancestry as covariates. That is, the probability that the haplotype sampled from individual carries the derived allele is p_i, where

log (\frac{p_{i}}{1 - p_{i}}) = α + β_{d a t e} d a t e_{i} + β_{P C 1} P C 1_{i} + \dots + β_{P C 10} P C 10_{i} .

We compared the full model above to a nested model with no principal components by performing a likelihood ratio test [in R we use anova(nested model, full model, test = ‘Chisq’)] to obtain a P value for the ancestry term which encompasses β_PC1…β_PC10. To obtain a P value for β_date, we compare a nested model without β_date to the full model.

Q_x Polygenic Selection Test.

We used the Q_x test for directional, polygenic selection on skin pigmentation (10). For this test, we used 170 skin pigmentation SNPs obtained from the UK Biobank in all 1000 Genomes populations. We restricted sites to those on the 1240K and constructed a covariance matrix from a total of one million SNPs. To calculate empirical P values, we sampled a total of 500,000 null genetic values matching skin pigmentation SNPs based on a ±2% frequency on the minor allele. However, we report one million runs for the top seven SNPs and two million runs for the top five and six to better distinguish the deviation of the tested SNP sets from the expectation under drift.

Variance in Skin Pigmentation Explained by Individual SNPs.

We estimated the variance explained in the phenotype by a SNP within the UK population using the following:

V E_{w i t h i n} = 2 β^{2} f (1 - f),

where β is the UK Biobank GWAS-estimated effect size and f is the allele frequency in the GBR population from 1000 Genomes. We calculated the proportion of variance explained as follows:

P V E_{w i t h i n} = \frac{V E_{w i t h i n}}{V E_{w i t h i n} + 2 N σ^{2} f (1 - f)},

where N is the GWAS sample size (356,530) and σ is the SE of the estimated β (99). To estimate the variance explained between GBR and YRI, we calculated the between population variance as follows:

V E_{b e t w e e n} = β^{2} (f_{a v g} (1 - f_{a v g}) - 0.5 [f_{1} (1 - f_{1}) + f_{2} (1 - f_{2})]),

where f₁ and f₂ are the allele frequencies of GBR and YRI, β is the UK Biobank GWAS-estimated effect size, and f_avg = 0.5(f₁ + f₂).

Population Branch Statistic.

We calculated the PBS (66) for different sets of ancient groups and present-day groups. For the ancient populations, we used inferred source population allele frequencies from ADMIXTURE (the P matrix), whereas for present-day groups, we used the observed allele frequencies estimated from 1000 Genomes YRI, CHB, and GBR. Using these frequencies, we estimated pairwise Hudson’s F_ST between groups for the PBS test. The estimator is calculated as follows:

\hat{F_{S T}} = \frac{{(p_{1} - p_{2})}^{2} - \frac{p_{1} (1 - p_{1})}{n_{1} - 1} - \frac{p_{2} (1 - p_{2})}{n_{2} - 1}}{p_{1} (1 - p_{2}) + p_{2} (1 - p_{1})},

where for population i, n_i is the sample size and p_i is the allele frequency of the sample. We excluded SNPs that had a missing rate of >90% in the capture-shotgun dataset. Because of variable coverage in ancient genomes, not all sites were used in the calculation. F_ST was calculated using 20- and 40-SNP nonoverlapping windows throughout the genome. F_ST values were transformed to calculate genetic divergence between populations as T = −log(1 – F_ST). We calculated PBS for ancient group X (i.e., hunter-gatherers, Early Farmers, or Steppe) using 1000 Genomes Han Chinese (CHB) and Yoruba (YRI) as follows:

P B S_{X} = \frac{T^{X, Y R I} + T^{X, C H B} - T^{Y R I, C H B}}{2} .

We identified nearby genes by examining the National Center for Biotechnology Information RefSeq track for assembly GRCh37/hg19 (downloaded from https://genome.ucsc.edu/cgi-bin/hgTables) for genes that overlap with windows with high PBS values. We note that the gene names are used simply as labels for each locus and do not necessarily represent the causal genes.

Acknowledgements

We thank Sarah Tishkoff, Alexander Platt, Bárbara Bitarello, Samantha Cox, Arslan Zaidi, and Andrew Chen for helpful comments on earlier versions of this manuscript. This research was supported by a Research Fellowship from the Alfred P. Sloan Foundation (FG-2018-10647), a New Investigator Research Grant from the Charles E. Kaufman Foundation (KA2018-98559), National Institute of General Medical Sciences Award R35GM133708, and National Research Service Award T32GM008216. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2009227118/-/DCSupplemental.

Code Availability.

Scripts used to generate the main results and figures of this paper are available at GitHub, https://github.com/mathilab/SkinPigmentationCode.

References

N. G. Jablonski, G. Chaplin, The evolution of human skin coloration. J. Hum. Evol. 39, 57–106 (2000).

N. G. Jablonski, G. Chaplin, Colloquium paper: Human skin pigmentation as an adaptation to UV radiation. Proc. Natl. Acad. Sci. U.S.A. 107 (suppl. 2), 8962–8968 (2010).

A. A. Zaidi, Investigating the case of human nose shape and climate adaptation. PLoS Genet. 13, e1006616 (2017).

F. Sassi, C. Tamone, P. D’Amelio, D. Vitamin, Nutrient, hormone, and immunomodulator. Nutrients 10, 1656 (2018).

B. F. Voight, S. Kudaravalli, X. Wen, J. K. Pritchard, A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).

P. C. Sabeti.; International HapMap Consortium, Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).

I. Mathieson, Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015).

J. K. Pickrell, Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826–837 (2009).

O. Lao, J. M. de Gruijter, K. van Duijn, A. Navarro, M. Kayser, Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms. Ann. Hum. Genet. 71, 354–369 (2007).

J. J. Berg, G. Coop, A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).

J. K. Pritchard, J. K. Pickrell, G. Coop, The genetics of human adaptation: Hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 20, R208–R215 (2010).

N. G. Crawford, Loci associated with skin pigmentation identified in African populations. Science 358, eaan8433 (2017).

A. R. Martin, An unexpectedly complex architecture for skin pigmentation in Africans. Cell 171, 1340–1353.e14 (2017).

Neale Lab, UK Biobank GWAS. www.nealelab.is/uk-biobank. Accessed 25 October 2018.

K. Adhikari, A GWAS in Latin Americans highlights the convergent evolution of lighter skin pigmentation in Eurasia. Nat. Commun. 10, 358 (2019).

I. Lazaridis, Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413 (2014).

I. Lazaridis, The evolutionary history of human populations in Europe. Curr. Opin. Genet. Dev. 53, 21–27 (2018).

C. Gamba, Genome flux and stasis in a five millennium transect of European prehistory. Nat. Commun. 5, 5257 (2014).

G. González-Fortes, Paleogenomic evidence for multi-generational mixing between Neolithic farmers and Mesolithic hunter-gatherers in the lower Danube basin. Curr. Biol. 27, 1801–1810.e10 (2017).

T. Günther, Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. Proc. Natl. Acad. Sci. U.S.A. 112, 11917–11922 (2015).

Z. Hofmanová, Early farmers from across Europe directly descended from Neolithic Aegeans. Proc. Natl. Acad. Sci. U.S.A. 113, 6886–6891 (2016).

E. R. Jones, Upper Palaeolithic genomes reveal deep roots of modern Eurasians. Nat. Commun. 6, 8912 (2015).

E. R. Jones, The Neolithic transition in the Baltic was not driven by admixture with early European farmers. Curr. Biol. 27, 576–582 (2017).

G. M. Kılınç, The demographic development of the first farmers in Anatolia. Curr. Biol. 26, 2659–2666 (2016).

A. Keller, New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. Nat. Commun. 3, 698 (2012).

M. Krzewińska, Ancient genomes suggest the eastern Pontic-Caspian steppe as the source of western Iron Age nomads. Sci. Adv. 4, eaat4457 (2018).

M. Krzewińska, Genomic and strontium isotope variation reveal immigration patterns in a Viking Age town. Curr. Biol. 28, 2730–2738.e10 (2018).

M. E. Allentoft, Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).

I. Lazaridis, Genetic origins of the Minoans and Mycenaeans. Nature 548, 214–218 (2017).

I. Lazaridis, Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–424 (2016).

M. Lipson, Parallel palaeogenomic transects reveal complex genetic history of early European farmers. Nature 551, 368–372 (2017).

R. Martiniano, Genomic signals of migration and continuity in Britain before the Anglo-Saxons. Nat. Commun. 7, 10326 (2016).

R. Martiniano, The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods. PLoS Genet. 13, e1006852 (2017).

I. Mathieson, The genomic history of southeastern Europe. Nature 555, 197–203 (2018).

I. Olalde, Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. Nature 507, 225–228 (2014).

I. Olalde, The genomic history of the Iberian Peninsula over the past 8000 years. Science 363, 1230–1234 (2019).

I. Olalde, The Beaker phenomenon and the genomic transformation of northwest Europe. Nature 555, 190–196 (2018).

I. Olalde, A common genetic origin for early farmers from Mediterranean Cardial and Central European LBK cultures. Mol. Biol. Evol. 32, 3132–3142 (2015).

C. E. G. Amorim, Understanding 6th-century barbarian social organization and migration through paleogenomics. Nat. Commun. 9, 3547 (2018).

A. Omrak, Genomic evidence establishes Anatolia as the source of the European Neolithic gene pool. Curr. Biol. 26, 270–275 (2016).

L. Saag, Extensive farming in Estonia started through a sex-biased migration from the steppe. Curr. Biol. 27, 2185–2193.e6 (2017).

S. Schiffels, Iron age and Anglo-Saxon genomes from East England reveal British migration history. Nat. Commun. 7, 10408 (2016).

M. Sikora, Ancient genomes show social and reproductive behavior of early Upper Paleolithic foragers. Science 358, 659–662 (2017).

P. Skoglund, Genomic diversity and admixture differs for Stone-Age Scandinavian foragers and farmers. Science 344, 747–750 (2014).

V. Villalba-Mouco, Survival of late Pleistocene hunter-gatherer ancestry in the Iberian Peninsula. Curr. Biol. 29, 1169–1177.e7 (2019).

M. Unterländer, Ancestry and demography and descendants of Iron Age nomads of the Eurasian steppe. Nat. Commun. 8, 14615 (2017).

L. M. Cassidy, Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome. Proc. Natl. Acad. Sci. U.S.A. 113, 368–373 (2016).

P. B. Damgaard, 137 ancient human genomes from across the Eurasian steppes. Nature 557, 369–374 (2018).

P. de Barros Damgaard, The first horse herders and the impact of early Bronze Age steppe expansions into Asia. Science 360, eaar7711 (2018).

D. M. Fernandes, A genomic Neolithic time transect of hunter-farmer admixture in central Poland. Sci. Rep. 8, 14879 (2018).

Q. Fu, The genetic history of Ice Age Europe. Nature 534, 200–205 (2016).

Q. Fu, An early modern human from Romania with a recent Neanderthal ancestor. Nature 524, 216–219 (2015).

Q. Fu, Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014).

T. Günther, Population genomics of Mesolithic Scandinavia: Investigating early postglacial migration routes and high-latitude adaptation. PLoS Biol. 16, e2003703 (2018).

S. Brace, Ancient genomes indicate population replacement in Early Neolithic Britain. Nat. Ecol. Evol. 3, 765–771 (2019).

A. Seguin-Orlando, Genomic structure in Europeans dating back at least 36,200 years. Science 346, 1113–1118 (2014).

K. R. Veeramah, Population genomic analysis of elongated skulls reveals extensive female-biased immigration in Early Medieval Bavaria. Proc. Natl. Acad. Sci. U.S.A. 115, 3494–3499 (2018).

M. A. Yang, 40,000-year-old individual from Asia provides insight into early population structure in Eurasia. Curr. Biol. 27, 3202–3208.e9 (2017).

M. Rasmussen, The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature 506, 225–229 (2014).

M. Gallego Llorente, Ancient Ethiopian genome reveals extensive Eurasian admixture throughout the African continent. Science 350, 820–822 (2015).

L. Jiang, A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 51, 1749–1755 (2019).

T. Berisa, J. K. Pickrell, Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).

N. Patterson, A. L. Price, D. Reich, Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

G. McInnes, Global Biobank Engine: Enabling genotype-phenotype browsing for Biobank summary statistics. Bioinformatics 35, 2495–2497 (2019).

O. Canela-Xandri, K. Rawlik, A. Tenesa, An atlas of genetic associations in UK Biobank. Nat. Genet. 50, 1593–1599 (2018).

X. Yi, Sequencing of fifty human exomes reveals adaptation to high altitude. Science 329, 75–78 (2010).

The 1000 Genomes Project Consortium, A global reference for human genetic variation. Nature 526, 68–74 (2015).

S. L. Cox, C. B. Ruff, R. M. Maier, I. Mathieson, Genetic contributions to variation in human stature in prehistoric Europe. Proc. Natl. Acad. Sci. U.S.A. 116, 21484–21492 (2019).

S. Mathieson, I. Mathieson, FADS1 and the timing of human adaptation to agriculture. Mol. Biol. Evol. 35, 2957–2970 (2018).

H. L. Norton, Genetic evidence for the convergent evolution of light skin in Europeans and East Asians. Mol. Biol. Evol. 24, 710–722 (2007).

M. Lin, Rapid evolution of a skin-lightening allele in southern African KhoeSan. Proc. Natl. Acad. Sci. U.S.A. 115, 13324–13329 (2018).

S. A. Savage, Nucleotide diversity and population differentiation of the melanocortin 1 receptor gene, MC1R. BMC Genet. 9, 31 (2008).

C. Martínez-Cadenas, Simultaneous purifying selection on the ancestral MC1R allele and positive selection on the melanoma-risk allele V60L in south Europeans. Mol. Biol. Evol. 30, 2654–2665 (2013).

A. Urnikyte, Patterns of genetic structure and adaptive positive selection in the Lithuanian population from high-density SNP data. Sci. Rep. 9, 9163 (2019).

S. R. Grossman, A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 327, 883–886 (2010).

A. J. Stern, P. R. Wilton, R. Nielsen, An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet. 15, e1008384 (2019).

J. M. de Gruijter, Contrasting signals of positive selection in genes involved in human skin-color variation from tests based on SNP scans and resequencing. Investig. Genet. 2, 24 (2011).

Y. Field, Detection of human adaptation during the past 2000 years. Science 354, 760–764 (2016).

C. A. Guenther, B. Tasic, L. Luo, M. A. Bedell, D. M. Kingsley, A molecular basis for classic blond hair color in Europeans. Nat. Genet. 46, 748–752 (2014).

M. Edwards, Association of the OCA2 polymorphism His615Arg with melanin content in East Asian populations: Further evidence of convergent evolution of skin pigmentation. PLoS Genet. 6, e1000867 (2010).

K. Eaton, Association study confirms the role of two OCA2 polymorphisms in normal skin pigmentation variation in East Asian populations. Am. J. Hum. Biol. 27, 520–525 (2015).

Z. Yang, A genetic mechanism for convergent skin lightening during recent human evolution. Mol. Biol. Evol. 33, 1177–1187 (2016).

F. Tekola-Ayele, Novel genomic signals of recent selection in an Ethiopian population. Eur. J. Hum. Genet. 23, 1085–1092 (2015).

Y. Okada, Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).

Y. Yasumizu, Genome-wide natural selection signatures are linked to genetic risk of modern phenotypes in the Japanese population. Mol. Biol. Evol. 37, 1306–1316 (2020).

M. Sohail, Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, 1–17 (2019).

J. J. Berg, Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725 (2019).

W. Haak, Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).

C. C. Chang, Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

C. T. Miller, cis-Regulatory changes in Kit ligand expression and parallel evolution of pigmentation in sticklebacks and humans. Cell 131, 1179–1189 (2007).

J. Han, A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLoS Genet. 4, e1000074 (2008).

S. I. Candille, Genome-wide association studies of quantitatively measured skin, hair, and eye pigmentation in four European populations. PLoS One 7, e48294 (2012).

N. Hernandez-Pacheco, Identification of a novel locus associated with skin colour in African-admixed populations. Sci. Rep. 7, 44548 (2017).

R. P. Stokowski, A genomewide association study of skin pigmentation in a South Asian population. Am. J. Hum. Genet. 81, 1119–1132 (2007).

F. Liu, Genetics of skin color variation in Europeans: Genome-wide association studies with functional follow-up. Hum. Genet. 134, 823–835 (2015).

H. Nan, P. Kraft, D. J. Hunter, J. Han, Genetic variants in pigmentation genes, pigmentary phenotypes, and risk of skin cancer in Caucasians. Int. J. Cancer 125, 909–917 (2009).

S. Beleza, Genetic architecture of skin and eye color in an African-European admixed population. PLoS Genet. 9, e1003372 (2013).

D. H. Alexander, J. Novembre, K. Lange, Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

H. Shim, A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS One 10, e0120758 (2015).