PLoS ONE
Home Clustering patterns mirror the geographical distribution and genetic history of Lemnos and Lesvos sheep populations
Clustering patterns mirror the geographical distribution and genetic history of Lemnos and Lesvos sheep populations
Clustering patterns mirror the geographical distribution and genetic history of Lemnos and Lesvos sheep populations

Competing Interests: The authors have declared that no competing interests exist.

Article Type: research-article Article History
Abstract

Elucidating the genetic variation and structure of Lemnos and Lesvos sheep is critical for maintaining local genetic diversity, ecosystem integrity and resilience of local food production of the two North Aegean islands. In the present study, we explored genetic diversity and differentiation as well as population structure of the Lemnos and Lesvos sheep. Furthermore, we sought to identify a small panel of markers with the highest discriminatory power to assign animals across islands. A total number of n = 424 (n = 307, Lemnos and n = 117, Lesvos) ewes, sampled from n = 24 herds dispersed at different geographic regions on the two islands, were genotyped with the 50K SNP array. Mean observed heterozygosity was higher (but not statistically significantly different) in Lesvos than in Lemnos population (0.384 vs. 0.377) while inbreeding levels were higher in Lemnos than Lesvos herds (0.065 vs. 0.031). Results of principal components along with that of admixture analysis and estimated genetic distances revealed genetic clusters corresponding to Lesvos and Lemnos origin and the existence of infrastructure within islands that were associated with geographical isolation and genetic history of the studied populations. In particular, genetic analyses highlighted three geographically isolated herds in Lemnos that are located at mountainous areas of the island and are characterized as representatives of the local sheep by historic data and reports. Admixture analysis also showed a shared genetic background between Lemnos and Lesvos sheep attributable to past gene flow. Little overall genetic differentiation was detected between the two island sheep populations, while 150 discriminatory SNPs could accurately assign animals to their origin. Present results are comparable with those reported in the worldwide sheep breeds, suggesting geography related genetic patterns across and within islands and the existence of the local Lemnos sheep.

Kominakis,Tarsani,Hager-Theodorides,Mastranestasis,Hadjigeorgiou,and Chiang: Clustering patterns mirror the geographical distribution and genetic history of Lemnos and Lesvos sheep populations

Introduction

Knowledge of the genetic variation within and between sheep populations is important for the identification of local genetic resources (i.e., individuals of local sheep breeds) to be maintained in animal genetics conservation efforts and the efficacy of applied selection programs. Preserving the genetic variability of livestock local genetic resources is important for ensuring food security under a changing climate and for sustaining productivity and resilience of agri-food chains. In Greece, sheep populations are reported to have been shaped by several factors such as geographical isolation, genetic drift, selection and crossbreeding [1]. Studies have already explored the genetic diversity and population structure of various sheep Greek breeds such as Frizarta [2], Chios [3], Karagouniko [3] and Lesvos [3,4].

The latter breed has been reported to cluster with other mountainous breeds in Greece [1]. It is placed on the center of an east–west geographical cline and is an intermediate between fat-tailed Asian and European sheep [5]. Using microsatellite markers, Mastranestasis et al. [4] estimated a heterozygote excess in the overall Lesvos sheep population as well as low genetic differentiation levels between herds attributable to high gene flow.

The Lesvos breed is a fat-tailed dairy sheep mainly located on the homonymous island of the North Aegean sea where about 260,000 purebred animals dispersed over 2,000 medium sized flocks (100–200 animals) are being kept under a semi-intensive production system (Ministry of Rural Development and Food, MRDF 2015 [6]). Apart from the Lesvos island, the breed is also found in the mainland and other Aegean islands where it is used to upgrade local sheep populations due to its pronounced milk productivity under harsh environmental conditions (dry heat areas and poor vegetation) [4].

Lemnos is another island of the North Aegean Sea neighbouring Lesvos, with a long tradition in small ruminants. Livestock farming on the island is dominated by dairy sheep totalling c.a. 60,000 ewes according to the EU’s Integrated Administration and Control System (IACS). Based on anecdotal reports, during the 1960s the local sheep population was extensively mated with the Lesvos breed. The Terra Lemnia project, aiming to evaluate the genetic resources of the island, conducted a survey among local livestock farmers regarding the characteristics of their farms. In this survey, 40 farmers were interviewed the majority of which characterized their sheep as crossbreds of the original local sheep mainly with Lesvos sheep and to a lesser extend with other Greek or foreign breeds. Nevertheless, pure-bred sheep belonging to the so called local Lemnos sheep were also reported at the mountainous Vigla region, at the northwestern part of the island. The survey data were supported by our own observations during visits to representative farms across the island, confirming high phenotypic variability within the Lemnos sheep population. This variability could be attributed either to extensive interbreeding or geographical isolation and local adaptation, with most distinctive that of the Vigla mountain (northwestern part of the island). In contrast to the Lesvos sheep breed that is largely purebred and relatively well characterized, the genetic makeup of the ‘modern’ Lemnos sheep remains largely unknown.

Sheep breeding is an important activity for both islands that shapes their ecosystem and contributes greatly to their agricultural, cultural and culinary heritage. In particular, milk from local ewes is used to produce cheeses, some of which are granted protected denomination of origin status e.g. the Ladotyri (Lesvos PDO), Kalathaki (Lemnos PDO) and Melichloro (Lemnos).

Driven from the importance of both sheep populations, in the present study we performed genetic analyses aiming at: (i) exploring the genetic makeup of Lemnos sheep and their relationship with the Lesvos breed, (ii) detecting population structure in the two populations and (iii) identifying a small panel of the most informative SNPs with high assignment power to predict island membership of animals. Current results are expected to shed more light in the evolutionary history of the two sheep populations and contribute to their better genetic management.

Material and methods

Ethics statement

All applicable ethical guidelines for the care and use of animals were followed and animal blood samples were collected by trained personnel under strict veterinary rules. All samples and data in our study were collected under the consent of the breeders. The protocol of the present study was approved by the Research Ethics and Ethics Committee of the Agricultural University of Athens (no 54/2020) according to the article 23 of law 4521/2018 of the Greek government.

Data and quality control

A total number of n = 433 sheep blood samples were collected from 24 herds of the two islands. All animals were genotyped with the Illumina Ovine 50K SNP array by Neogen Europe Ltd (Ayr, UK). First, the accuracy of SNP genotyping was assessed and following the manufacturer’s recommendations, genotypes presented with a GenCall score greater than 0.30 were considered called. We then performed quality control (QC) both at the animal level and marker level. At the animal level, n = 9 samples were excluded due to call rate lower than 0.95 resulting in a total number of 424 samples in 24 herds in Lemnos (n = 307 in 18 herds) and Lesvos (n = 117 in 6 herds). At the marker level, n = 3,762 markers of the 43,647 autosomal SNPs were excluded due to: call rate lower than 0.95, minor allele frequency (MAF) lower than 0.05, deviation from Hardy–Weinberg equilibrium (HWE) using a Fisher exact test p-value<10−4 and linkage disequilibrium (LD) r2 levels (r2>0.50, window size: 50 SNPs, increment: 5 SNPs). Application of QC criteria at the marker level resulted in a total number of 39,885 autosomal SNPs retained for further analyses. QC was performed using the SNP & Variation Suite software (version 8.8.3).

Genetic diversity and inbreeding

Measures of genetic diversity i.e. expected heterozygosity (HE) and observed heterozygosity (HO) were estimated per island and per herd within islands using the ‘gl.basic.stats’ function from the dartR R package [7]. The Brown and Forsythe’s test for homogeneity of variance was then applied to test for statistically significant differences of Ho between islands as well as differences between He and Ho, within islands. This test was performed using SAS ver9.0 (2002).

Individual inbreeding levels were assessed using two measures. First, the individual inbreeding coefficients (fis) were calculated using PLINK 1.9 [8] based on the observed versus expected number of homozygous genotypes. This inbreeding coefficient fis is equivalent to Wright’s within-subpopulation fixation index FIS, with negative and positive values denoting heterozygote excess and deficit, respectively. Then, runs of homozygosity (ROH) per animal were detected using the following criteria: (i) the minimum number of SNPs included in a ROH was set at 25, (ii) the minimum length of a ROH was set at 1 Mb, (iii) one heterozygous genotype and no more than one missing SNP were allowed per ROH and (iv) the maximum distance between consecutive SNPs was set to 1 Mb. This analysis was performed using the ‘consecutiveRUNS.run’ function in the detectRUNS R package [9]. Next, individual genomic inbreeding coefficients based on ROH (fROH) were calculated using the ‘Froh_inbreeding’ function in the detectRUNS R package [9]. Specifically, fROH were calculated from the sum of ROH lengths, divided by the total length of the autosomal genome covered by markers (2610 Mb), as proposed in [10]. Four fROH coefficients per animal were calculated using ROH of different length classes: (i) all ROH, (ii) 1 to 5 Mb, (iii) 5 to 20 Mb and (iv) >20 Mb. Finally, the degree of familial relationships i.e. shared ancestry between pairs of individuals sampled per herd was estimated using the π^=P(IBD=2)+0.5P(IBD=1) summary statistic obtained by PLINK 1.9 [8], where P(IBD = 2) and P(IBD = 1) are the probabilities of two individuals carrying 2 or 1 SNP alleles identical by descent, respectively. π^ values can range from 0 for unrelated samples to 1 for duplicated samples (or twins). Intermediate values such as 0.0625, 0.125, 0.25 and 0.50 denote fourth (e.g. double cousins), third (e.g. first cousins), second (e.g. half-sibs) and first degree relatives (e.g. full sibs, parent-offspring).

Detection of genetic structure

Detection of genetic structure was assessed using three different approaches. First, the genomic relationship matrix (GRM) between all pairs of individuals using genotypes of the 39,885 SNPs was calculated in PLINK 1.9 [8]. Elements of the GRM were then mapped to a color similarity matrix derived by Pearson moment correlations of the genomic relationships between pairs of individuals to produce a heat map. This heatmap was constructed via the Morpheus matrix visualization and analysis platform (https://software.broadinstitute.org/morpheus/).

Next, Principal Component Analyses (PCA) of animals’ genotypes was conducted to detect genetic clusters between and within island populations. Each time, the first 10 PCs were calculated using the ‘—pca’ command in PLINK 1.9 [8] and then pairwise plots of the main two PC were constructed to visually appraise clustering patterns in the data.

Finally, a variational Bayesian framework as implemented in the fastStructure [11] was employed to infer population structure using admixture analysis (AD). Variational Bayesian inference aims to repose the problem of inference as an optimization problem rather than a sampling problem and it is reported to deliver comparable results to ADMIXTURE in less runtime when using large numbers of genetic markers [11]. Here, K-values (i.e. the number of assumed ancestral populations) ranged from 2 to 8 and a five-fold CV procedure was performed to determine the optimal K-value presenting the lowest CV error. The default convergence criterion and prior were used as provided in the python script ‘structure.py’ (see: https://rajanil.github.io/fastStructure/). FastStructure results for each K cluster were graphically visualized using Distruct from CLUMPAK (http://clumpak.tau.ac.il/ [12]).

Genetic differentiation and distances

Genetic differentiation both between and within islands were explored via pairwise estimations of fixation index (FST) values between herds. Specifically, we used the ‘stamppFst’ function with a number of 1000 bootstraps and 0.95 confidence interval from the StAMPP [13] R package. Pairwise Nei’s genetic distances between herds were then calculated using the ‘stamppNeisD’ function in the StAMPP [13] R package and were graphically visualized using NeighborNet graphs constructed in SplitsTree5 [14].

Discriminant analysis

Prediction of group membership of sheep samples of the two islands using limited numbers of SNPs with highest discriminatory power, followed. To this end, first we selected limited numbers (25 to 150) of strongly differentiated SNPs between the two islands defined as markers with highest FST values between the two islands. Next, we performed PCA on the various sets of differentiated markers as a dimensionality-reduction technique in attempts to obtain smaller sets of variables i.e. Principal Components that contain most of the variance in the original sets. Finally, Linear Discriminant Analysis (LDA) on the constructed PCs was applied using normal-theory methods assuming unequal variances, prior probabilities proportional to sample sizes and the CrossValidate (CV) option to obtain CV error-rate estimates. LDA was performed using the DISCRIM procedure in SAS ver 9.0 (2002). When the CV classification is specified, PROC DISCRIM classifies each observation in the data set using the discriminant function computed from the other observations in the data set, excluding the observation being classified. Thus, every data point is reclassified as if it were a new unknown observation. This provides a more conservative accuracy assessment. A total number of 16 cases arising from combinations of different numbers of SNPs and constructed PCs were examined during LDA. Results of LDA are presented as misclassification error rates per island and on average.

Results

Genetic diversity and inbreeding levels

Based on estimates of inter-locus diversity, average HO was higher in Lesvos than in Lemnos population (0.384 vs. 0.377) (Table 1). Nevertheless, application of the Brown and Forsythe’s test for homogeneity of variance showed that this difference was not statistically significant (F = 192.12, df = 1, p<0.001). For Lemnos sheep, HO was less than HE (0.377 vs. 0.388, Brown and Forsythe’s test, F = 0.23, df = 1, p<0.64) while in the Lesvos sheep, Ho followed closely HW expectations (0.384 vs. 0.387, Brown and Forsythe’s test, F = 81.69, df = 1, p<0.001).

Table 1
Average observed heterozygosity (HO) and expected heterozygosity (HE) for the two islands.
IslandMeanSDaMinMax
LemnosHO0.3770.1100.0360.608
HE0.3880.1110.0420.500
LesvosHO0.3840.12000.675
HE0.3870.11400.500

aStandard deviation.

Levels of genetic diversity (HO) along with fis, fROH and π^ estimates across the herds of the two islands are shown in Table 2. HO values varied more between Lemnos herds compared to Lesvos. Within Lemnos, HO widely ranged from 0.363 (LEMN2) to 0.393 (LEMN16) with many herds displaying HO values in the lower range (0.36–0.37). In contrast, average HO of the Lesvos herds was always in the upper range (0.38–0.39). Estimates of average herd fis ranged from -0.0125 (LEMN 16) to 0.066 (LEMN2) and ROH-based inbreeding coefficients (fROH) from 0.015 (LEMN16) to 0.063 (LEMN2). Furthermore, overall average fis estimates were higher in Lemnos (0.0284) than Lesvos (0.0066) and this was also the case for fROH (0.065 vs. 0.031, S1 Table). This trend was consistent across all ROH length classes and was strongly manifested for the long ROH lengths (>20 Mb) (0.119 vs. 0.076, S1 Table) that are indicative of recent parental relatedness. The highest inbreeding coefficient observed, fROH = 0.268, was in a Lemnos sample expected to be the result of a mating of close relatives (parent-offspring or full sibs). Estimates of shared ancestry (π^) across herds also ranged widely from 0.0253 (LESV2) to 0.130 (LEMN4). Of the 24 herds, three herds (LEMN1, LEMN4 and LEMN2) yielded average genomic relatedness greater than 0.10 consistent with the presence of multiple second and third order relatives (‘cryptic relatives’). Fig 1 presents a scatter plot of HO against fROH, fis and π^ across herds. As this Figure depicts, there is a linear negative relationship, as derived by the Spearman correlation (rS), between HO and fROH (rS = -0.952, P<0.001) that was more apparent for the pair HOfis (rS = -0.988, p<0.001) and less pronounced for pair Hoπ^ (rs = -0.483, p = 0.0168). Note the close relationship between fis and fROH estimates (rS = 0.965, p<0.001).

Scatter plot of observed heterozygozity (HO) against average inbreeding coefficients (fROH, fis) and familial relationships (π^) per herd and island.
Fig 1

Scatter plot of observed heterozygozity (HO) against average inbreeding coefficients (fROH, fis) and familial relationships (π^) per herd and island.

Table 2
Average observed heterozygosity (HO), expected heterozygosity (HE), fROH and fis within herds of Lemnos (LEMN) and Lesvos (LESV).
HerdNHO (SD)HE (SD)fROHSDfROHfisSDfisalternatives
LEMN1210.365 (0.18)0.355 (0.15)0.06280.0520.05940.0520.1159
LEMN2200.363 (0.18)0.353 (0.15)0.06300.0470.06610.0490.1240
LEMN3190.375 (0.17)0.371 (0.14)0.03800.0410.03440.0420.0573
LEMN470.388 (0.24)0.354 (0.17)0.02480.0070.00040.0090.1306
LEMN5190.385 (0.17)0.380 (0.13)0.02380.0140.00850.0230.0395
LEMN6160.366 (0.18)0.365 (0.15)0.06550.0680.05730.0690.0752
LEMN7130.375 (0.18)0.379 (0.14)0.04150.0460.03260.0480.0295
LEMN8200.387 (0.16)0.382 (0.13)0.02120.0220.00240.0270.0408
LEMN9110.381 (0.18)0.383 (0.14)0.03750.0320.01880.0340.0311
LEMN10130.375 (0.19)0.367 (0.15)0.04520.0410.03340.0430.0797
LEMN11180.372 (0.17)0.373 (0.14)0.04910.0290.04060.0320.0487
LEMN12200.376 (0.16)0.375 (0.14)0.05460.0480.03070.0490.0486
LEMN13210.374 (0.17)0.369 (0.14)0.04780.0360.03560.0380.0700
LEMN14210.375 (0.16)0.379 (0.13)0.04630.0490.03350.0700.0419
LEMN15170.388 (0.17)0.378 (0.14)0.02620.0110.00000.0250.0602
LEMN16220.393 (0.16)0.385 (0.13)0.01460.014-0.01250.0260.0437
LEMN17200.372 (0.16)0.373 (0.14)0.05380.0460.04090.0470.0518
LEMN1890.368 (0.21)0.360 (0.16)0.06420.0320.05090.0340.0975
LESV1190.379 (0.17)0.374 (0.14)0.02730.0250.01920.0300.0488
LESV2190.379 (0.16)0.381 (0.13)0.02540.0220.01720.0230.0253
LESV3190.387 (0.16)0.380 (0.13)0.01630.013-0.00130.0180.0450
LESV4190.389 (0.16)0.383 (0.13)0.01160.005-0.00560.0120.0323
LESV5210.379 (0.17)0.368 (0.14)0.03230.0400.02080.0420.0871
LESV6200.391 (0.16)0.385 (0.13)0.00980.005-0.01080.0170.0346

The sample size (N) and standard deviation (SD) are also provided.

Genetic structure

Fig 2 shows a heatmap of the animals’ genetic relationships along with k-means clustering in attempts to detect patterns in the data according to genomic relatedness and/or geography. Visual inspection of the heatmap revealed a certain degree of clustering of animals at the herd level (depicted as small sized red colored squares on the diagonal) and most interestingly the formation of two off-diagonal clusters depicted as large sized red colored squares, one at the upper left formed by animals of three Lemnos herds (LEMN1, LEMN2 and LEMN3) and another at the lower right with animals of the six Lesvos herds. Note that the three Lemnos herds that were represented as a separate group during k-means clustering are topographically isolated herds, located in the northwestern part of the island on the Vigla mountain and are considered representatives of the local Lemnos sheep.

Heatmap of the genomic relationship matrix of 424 sheep from Lemnos and Lesvos along with k-means clustering (k = 2–8).
Fig 2

Heatmap of the genomic relationship matrix of 424 sheep from Lemnos and Lesvos along with k-means clustering (k = 2–8).

The heatmap presents a color similarity matrix derived by Pearson moment correlations of the genomic relationships between pairs of individuals where red and blue colors indicate high (+1) and low (-1) genetic relationships, respectively. The presence of high intensity off-diagonal clusters indicates population structure. Small sized red colored squares on the diagonal denote high genomic relationships of animals at herd level. Large sized red colored squares at the upper left and lower right show the high genomic relationships of animals of the first three herds of Lemnos (mountain) and the six herds of Lesvos, respectively, that were grouped as separated clusters during k-means clustering.

Two-dimensional plots of the first two PCs derived from PCA on the 424 genotypes and 39,885 markers across and within islands are shown in Fig 3. The first top principal component (PC1) for all genotypes (Fig 3A) accounted for 8.9% of variance in the genetic data and was associated with variability of Lemnos genotypes (red color). In the same plot, the second top principal component (PC2, 8.3% of the genetic variability) was associated with variability of Lesvos genotypes (blue color) and separated most Lesvos from Lemnos samples. PC2 also demonstrated the closeness of Lesvos and Lemnos sheep populations as some (n = 10) Lesvos samples had PC2 values similar to Lemnos sheep (see center of the plot). PCA plot of Lesvos genotypes in the two-dimensional space (Fig 3B) revealed genetic infrastructure within the island, with herds LESV1 (green color) and LESV5 (red color) appearing as separated genetic groups from the remaining herds. Genetic infrastructure was also detected within Lemnos in the two-dimensional space. Here, there were three distinct clusters detected corresponding to herds LEMN1 (green), LEMN2 (yellow) and LEMN3 (light grey) (Fig 3C). Overall, PCA plots increased detection resolution by disclosing previously undiscovered genetic infrastructure within islands. In all the PCA plots the explanatory power of the first two PC axes was low (explaining about 16% - 18% of variation in the genetic data) implying that there could be more undetectable genetic structure in the data, not identifiable via two dimensional plots.

Two-dimensional plots of the first two principal component (PC) axes of genetic variation (labeled PC1 and PC2) of sheep genotypes per island (A), for Lesvos (B) and Lemnos (C). The percentage of variance each PC axis explained is also provided. Each point represents an individual genotype and points are colored by island (top) and herds within islands (middle and bottom). Plots were constructed using the ggplot2 R package [15].
Fig 3

Two-dimensional plots of the first two principal component (PC) axes of genetic variation (labeled PC1 and PC2) of sheep genotypes per island (A), for Lesvos (B) and Lemnos (C). The percentage of variance each PC axis explained is also provided. Each point represents an individual genotype and points are colored by island (top) and herds within islands (middle and bottom). Plots were constructed using the ggplot2 R package [15].

Detection of genetic clusters as ancestry proportions via AD is shown in Fig 4. With the number of clusters (K) set at 2 (CV error = 0.6098) only one distinct cluster was generated that included animals of herds LEMN1, LEMN2 and LEMN3 (light blue color). This cluster was also persistent at K = 3 (CV error = 0.6086) when another cluster corresponding to Lesvos sheep (dark blue color) emerged. At K = 4 (CV error = 0.6163), animals of herds LEMN1, LEMN2 and LEMN3 split-off into two groups suggesting different ancestry genetic backgrounds for animals of herd LEMN1 vs. herds LEMN2 and LEMN3.

Genetic clusters as ancestry proportions of Lemnos and Lesvos sheep revealed by admixture analysis in fastStructure [11].
Fig 4

Genetic clusters as ancestry proportions of Lemnos and Lesvos sheep revealed by admixture analysis in fastStructure [11].

The x-axis represents individuals sorted according to herd_ID within islands. Each color represents a population and each individual is represented by a vertical line partitioned into colored segments whose lengths represent the admixture proportions from K ancestral populations. Plots were constructed in CLUMPAK (http://clumpak.tau.ac.il/, [12]).

Note that at K = 3 and 4 Lesvos genotypes were represented as an admixed population. At the optimal step (K = 5) i.e. lowest CV error (0.6020), there were four major clusters generated, formed by herd LEMN1 (green color), herds LEMN2 and LEMN3 (light blue color), herd LEMN6 (purple color) and the Lesvos breed-specific cluster (dark blue color).

AD results largely concord with outcomes of k-means clustering of the GRM heatmap and PCA confirming genetic distinctness of the three Lemnos herds and the Lesvos sheep. Nevertheless, AD disclosed an additional finding. Many of the Lemnos sheep were represented as admixed individuals with variable and occasionally sizeable degree of Lesvos sheep ancestry (dark blue color, Fig 4). Such admixture patterns are typical for interbreds and fully comply with historical reports and phenotypic appearance of the animals.

Genetic differentiation and genetic distances

The two sheep populations showed little overall genetic differentiation as derived by the overall FST estimate (0.009). S2 Table shows pairwise FST estimates between herds that ranged from 0.014 (LESV4—LESV6) to 0.096 (LEMN2-LEMN4). Results of FST were also confirmed by estimates of Nei’s genetic distances between herds, with lowest distances (0.025) attained for herds LESV4-LESV6 and highest (0.085) for herds LEMN2-LEMN4 (S3 Table).

In accordance with GRM, k-means clustering and PC analyses findings, Neighbor-network phylogenetic analysis of Nei’s genetic distances revealed two separate branches (Fig 5), one corresponding to Lesvos (blue background color) and the other to Lemnos herds (yellowish background color). Within Lemnos, three branches formed, one comprised by herds LEMN1, LEMN2, LEMN3 and LEMN13, located at the northwest part of the island (green background color), a second with herds LEMN6, LEMN9 and LEMN10, located at the southern part of the island and a third including herds showing a wider geographical distribution. Note that long terminal distances of herds LEMN4 and LEMN18 reflect the increase of Nei’ distances resulting from small samples (see S3 Table).

Neighbornet graph of Lesvos and Lemnos sheep based on estimated pairwise Nei’s genetic distances between herds.
Fig 5

Neighbornet graph of Lesvos and Lemnos sheep based on estimated pairwise Nei’s genetic distances between herds.

Yellow background color denotes herds of Lemnos and blue background color herds of Lesvos. For more details, see text. The graph was constructed in SplitsTree5 [14].

Prediction of group membership

Results of LDA across the various scenarios examined are presented in Table 3. Using only 25 markers resulted in an average misclassification error rate as high as 4.25% mainly due to high error rates (10.26%) for the Lesvos samples. The use of 50 markers improved the performance of the classification criterion resulting in average error rates of 1.42%. A SNP panel comprising 100 strongly differentiated markers further improved the performance of the classification criterion resulting in lower average error rates (0.24%). Notably, employment of panel of the 150 most differentiated SNPs was associated with nil misclassification error rates (Table 3).

Table 3
Cross-validation misclassification error rates of discriminant analysis of strongly differentiated SNPs.
FSTMisclassification error rates
Number of markersMean (SDa)Number of Principal ComponentsLemnosLesvosAverage
250.159 (0.034)10.01950.10260.0425
20.01950.10260.0425
50.02280.10260.0448
100.01950.10260.0425
500.140 (0.030)10.00650.03420.0142
20.00650.03420.0142
50.00650.03420.0142
100.00650.05130.0189
1000.1247 (0.027)10.00000.00850.0024
20.00000.00850.0024
50.00330.04270.0142
100.00980.02560.0142
1500.115 (0.025)10.00.00.0
20.00.00.0
50.00650.00000.0047
100.01950.00000.0142

aStandard deviation.

Discussion

Genetic diversity and inbreeding

Both sheep populations studied herein maintained sufficient genetic variation as inferred by HO with values in the range of 0.36 to 0.39. These estimates are comparable or higher to other southern and western European (0.22–0.38) [16] sheep breeds, are higher than the Oxford sheep breed (0.35) in the United States [17] and comparable (Lemnos) or higher (Lesvos) than other Greek breeds such as Chios, Kymi and Lesvos [5].

Of the two populations, Lemnos herds displayed less heterozygosity than expected under random mating. The most plausible cause for the heterozygote deficit is the mating of close relatives i.e. inbreeding [18]. The linear negative correlation between observed heterozygosity and inbreeding levels (Fig 1) and the increased inbreeding coefficients estimated for Lemnos herds (Table 1), suggests that heterozygote deficiency in the Lemnos sheep can be attributed to higher incidence of matings between relatives. In line with these findings, a linear inverse relationship between genetic diversity and inbreeding has also been established in other studies of Balkan and European sheep breeds [5].

Aside from inbreeding, heterozygote deficits can also be interpreted as evidence of the Wahlund effect [19] that occurs when differentiated subpopulations are sampled together (cryptic population structure). This may have also been the case here, as the Lemnos sample consists of several subsamples containing admixtures of individuals from different subpopulations in proportions that were variable from one subsample to the other. However, given that the herds exhibited little genetic differentiation (FST values in the range from 0.01 to 0.09), the Wahlund effect is an unlikely explanation for the observed negative relationship between inbreeding and HO. Furthermore, in contrast to the positive correlation between interlocus fst and fis that is theoretically expected as a result of the Wahlund effect [20,21], here a low negative correlation (r = -0.063, p<0.001) was estimated (results not shown).

Finally, cryptic relatedness i.e. non-random sampling of members from a limited number of families may also cause deviations from Hardy-Weinberg proportions [2224]. Cryptic relatedness could be hypothesized here, as estimates of shared ancestry (π^) across herds (see Table 1 and Fig 1) indicate that in some cases, samples were collected from family members with third or even second degrees of consanguinity (π^~0.125–0.25). However, progeny sampled from a small number of progenitors are expected to exhibit a slight heterozygote excess relative to HW proportions [2224] in contrast to the apparent heterozygosity deficit here. The latter finding allows us to conclusively reject non-random sampling from limited number of reproductive groups as a source of heterozygote deficiency leaving inbreeding as the only plausible explanation.

As in other studies (e.g. [2]), current results suggest that the use of dense genetic marker information is valuable for depicting the existing inbreeding levels, particularly when pedigree data are not available, as in our case. Both fis and fROH estimates delivered comparable results that largely coincide. A close relationship between the two measures (fis and fROH) has also been observed in other Greek breeds, e.g. Frizarta [2], or foreign sheep breeds [25]. Of the two measures, fROH estimates present the best choice as they give a clearer insight on past and recent inbreeding history of animals [26,27]. Particularly, the highest fROH estimates for the long ROH lengths measuring >20 Mb obtained herein that reflect recent inbreeding history, underscore the need for efficient genetic management of the sheep populations to avoid close relative matings, especially for the Lemnos sheep. Recent inbreeding has been identified as a major determinant of the current genetic diversity and effective population size of many Greek sheep breeds (e.g. Frizarta [2], Boutsiko, Karagouniko and Chios [28]).

Genetic differentiation and discriminatory power of differentiated markers

Average FST values estimated herein indicate little genetic differentiation between Lesvos and Lemnos sheep populations. To further support this finding, AD results show that the genetic makeup of most Lemnos sheep examined here has largely been shaped by the Lesvos sheep. This genetic closeness of the two populations can most likely be attributed to past gene flow from Lesvos to Lemnos island as suggested by historic data, local reports and phenotypic appearance.

Another interesting finding obtained here relates to the discriminatory power of differentiated markers in efforts to devise a tool for tracing the origin of local sheep and possibly their products. The idea here was first to identify limited numbers of strongly differentiated markers across islands, then to combine most of their genetic information into small number of variables (one or two PCs), and finally to derive a PC-based discriminatory criterion that could be used to accurately predict animals’ island membership. Present results have shown that the aforementioned approach could be practically implemented via development of a custom-made 150 SNP array that could be made available at an affordable price for large scale analyses. This SNP chip array could be used to discriminate Lesvos sheep from Lemnos sheep as long as no further gene flow occurs between the two islands. Although the discriminatory power of the 150 SNP panel under consideration was assessed using a conservative approach such as CV classification, its discriminatory performance further warrants to be tested using ‘true’ validation procedures by separating calibration and test data sets and a larger number of animals.

Michailidou et al. [28] have also demonstrated that a specialized marker panel including a total number of 3,802 SNPs can be applied for traceability purposes (identification of purebreds and crossbreds) of other indigenous Greek sheep breeds. Similar to the present study, Dimauro et al. [29] also showed that a small number of 108 SNPs were capable to discriminate 21 sheep breeds. Discriminatory SNP panels for breed assignment have also been identified for goats [30] and cattle [31].

Geography related genetic structure and implications

Geographical isolation confounded with genetic origin as well as past interbreeding appeared to be key determinants of genetic structure patterns detected in the present study. The most important implication of the genetic patterns detected herein was the identification of three specific clusters within Lemnos that correspond to equal number of topographically isolated herds located at the rough terrain of the Vigla mountain. According to local farmers interviewed, animals of these herds are morphologically reminiscent of the local Lemnos sheep that historically prevailed across the island some decades ago. More recently, particularly in the lowland areas of Lemnos, the original local sheep were extensively interbred with other indigenous sheep from neighboring islands or other exotic breeds. The pure-bred Lemnos local sheep population was progressively restricted to the mountainous areas and declined in numbers, resulting in only about 800–900 animals remaining today mainly at the Vigla mountain. Due to their high adaptability and resilience, the owners of these remaining herds do not interbreed their animals with other indigenous or exotic sheep breeds. In contrast, interbreeding is still common practice in the lowland areas.

Given the critical census size and the importance of these animals in terms of maintenance of local genetic diversity and ecosystem integrity, immediate conservation interventions should be undertaken as isolation, together with small population size could result in the decline and ultimately extinction of this population [32,33].

Conclusions

The use of genomic data information revealed that Lesvos and Lemnos sheep populations are clustered according to geographical location and/or genetic origin. Within Lemnos, genetic structure patterns were associated either with geographical isolation of local genetic resources or interbreeding with indigenous or exotic sheep breeds. Practical on-ground guidance and broader management actions should be undertaken to ensure the conservation of the ‘rediscovered’ Lemnos sheep.

Acknowledgements

We would like to especially thank all contributing farmers for allowing us to have access to their genetic material.

References

CLigda, JAltarayrah, AGeorgoudis. Genetic analysis of Greek sheep breeds using microsatellite markers for setting conservation priorities. Small Rumin Res. Elsevier; 2009;83: 4248. 10.1016/j.smallrumres.2009.04.002

AKominakis, ALHager-Theodorides, ASaridaki, GAntonakos, GTsiamis. Genome-wide population structure and evolutionary history of the Frizarta dairy sheep. Animal. Cambridge University Press; 2017;11: 16801688. 10.1017/S1751731117000428

IMastranestasis, CLigda, KTheodorou, VEkateriniadou L. Genetic structure and diversity among three greek sheep breeds using random amplified polymorphic DNA-PCR. J Hell Vet Med Soc. Hellenic Veterinary Medical Society; 2011;62: 301313. 10.12681/jhvms.14860

IMastranestasis, L VEkateriniadou., CLigda, KTheodorou. Genetic diversity and structure of the Lesvos sheep breed. Small Rumin Res. Elsevier; 2015;130: 5459. 10.1016/j.smallrumres.2015.07.015

ECiani, SMastrangelo, ADa Silva, FMarroni, MFerenčaković, PAjmone-Marsan, et al. On the origin of European sheep as revealed by the diversity of the Balkan breeds and by optimizing population-genetic analysis tools. Genet Sel Evol. BioMed Central Ltd.; 2020;52: 25. 10.1186/s12711-020-00545-7

EFABIS Greece | Domestic Animal Diversity Information System (DAD-IS) | Food and Agriculture Organization of the United Nations [Internet]. [cited 29 Oct 2020]. Available: http://www.fao.org/dad-is/regional-national-nodes/efabis-grc/en/.

BGruber, PJUnmack, OFBerry, AGeorges. DARTR: An R package to facilitate analysis of SNP data generated from reduced representation genome sequencing. Mol Ecol Resour. Blackwell Publishing Ltd; 2018;18: 691699. 10.1111/1755-0998.12745

SPurcell, BNeale, KTodd-Brown, LThomas, MARFerreira, DBender, et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. Cell Press; 2007;81: 559575. 10.1086/519795

FBiscarini, PCozzi, GGaspa, GMarras. detectRUNS: Detect runs of homozygosity and runs of heterozygosity in diploid genomes [Internet]. 2018 [cited 11 Oct 2020]. Available: https://cran.r-project.org/package=detectRUNS.

10 

RMcQuillan, ALLeutenegger, RAbdel-Rahman, CSFranklin, MPericic, LBarac-Lauc, et al. Runs of Homozygosity in European Populations. Am J Hum Genet. Cell Press; 2008;83: 359372. 10.1016/j.ajhg.2008.08.007

11 

ARaj, MStephens, JKPritchard. FastSTRUCTURE: Variational inference of population structure in large SNP data sets. Genetics. Genetics; 2014;197: 573589. 10.1534/genetics.114.164350

12 

NMKopelman, JMayzel, MJakobsson, NARosenberg, IMayrose. Clumpak: A program for identifying clustering modes and packaging population structure inferences across K. Mol Ecol Resour. Blackwell Publishing Ltd; 2015;15: 11791191. 10.1111/1755-0998.12387

13 

LWPembleton, NOICogan, JWForster. StAMPP: An R package for calculation of genetic differentiation and structure of mixed-ploidy level populations. Mol Ecol Resour. Mol Ecol Resour; 2013;13: 946952. 10.1111/1755-0998.12129

14 

DHHuson, DBryant. Application of phylogenetic networks in evolutionary studies [Internet]. Molecular Biology and Evolution. Oxford Academic; 2006. pp. 254267. 10.1093/molbev/msj030

15 

H.Wickham ggplot2: Elegant Graphics for Data Analysis [Internet]. Springer-Verlag New York; 2016. Available: https://ggplot2.tidyverse.org.

16 

JWKijas, JALenstra, BHayes, SBoitard, LRPorto Neto, MSan Cristobal, et al. Genome-Wide Analysis of the World’s Sheep Breeds Reveals High Levels of Historic Mixture and Strong Recent Selection. CTyler-Smith, editor. PLoS Biol. Public Library of Science; 2012;10: e1001258. 10.1371/journal.pbio.1001258

17 

KMDavenport, CHiemke, SDMcKay, JWThorne, RMLewis, TTaylor, et al. Genetic structure and admixture in sheep from terminal breeds in the United States. Anim Genet. Blackwell Publishing Ltd; 2020;51: 284291. 10.1111/age.12905

18 

S.Wright Systems of Mating. II. the Effects of Inbreeding on the Genetic Composition of a Population. Genetics. Genetics Society of America; 1921;6: 12443. Available: http://www.ncbi.nlm.nih.gov/pubmed/17245959.

19 

S.Wahlund ZUSAMMENSETZUNG VON POPULATIONEN UND KORRELATIONSERSCHEINUNGEN VOM STANDPUNKT DER VERERBUNGSLEHRE AUS BETRACHTET. Hereditas. John Wiley & Sons, Ltd; 1928;11: 65106. 10.1111/j.1601-5223.1928.tb02483.x

20 

RSWaples, FAllendorf. Testing for hardy-weinberg proportions: Have we lost the plot? [Internet]. Journal of Heredity. Oxford University Press; 2015. pp. 119. 10.1093/jhered/esu062

21 

LAZhivotovsky. Relationships between wright’s F ST and F IS statistics in a context of wahlund effect. J Hered. Oxford University Press; 2015;106: 306309. 10.1093/jhered/esv019

22 

A.Robertson The interpretation of genotypic ratios in domestic animal populations. Anim Prod. Cambridge University Press; 1965;7: 319324. 10.1017/S0003356100025770

23 

JFCrow, MKimura. An introduction to population genetics theory. An Introd to Popul Genet theory. New York, Evanston and London: Harper & Row, Publishers; 1970.

24 

VCastric, LBernatchez, KBelkhir, FBonhomme. Heterozygote deficiencies in small lacustrine populations of brook charr Salvelinus fontinalis Mitchill (Pisces, Salmonidae): A test of alternative hypotheses. Heredity (Edinb). Nature Publishing Group; 2002;89: 2735. 10.1038/sj.hdy.6800089

25 

IBelabdi, AOuhrouch, MLafri, SBSGaouar, ECiani, ARBenali, et al. Genetic homogenization of indigenous sheep breeds in Northwest Africa. Sci Rep. Nature Research; 2019;9: 113. 10.1038/s41598-018-37186-2

26 

DPHowrigan, MASimonson, MCKeller. Detecting autozygosity through runs of homozygosity: A comparison of three autozygosity detection algorithms. BMC Genomics. BioMed Central; 2011;12: 460. 10.1186/1471-2164-12-460

27 

MCKeller, PMVisscher, MEGoddard. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics. Genetics Society of America; 2011;189: 237249. 10.1534/genetics.111.130922

28 

SMichailidou, GTsangaris, GCFthenakis, ATzora, ISkoufos, SCKarkabounas, et al. Genomic diversity and population structure of three autochthonous Greek sheep breeds assessed with genome-wide DNA arrays. Mol Genet Genomics. Springer Verlag; 2018;293: 753768. 10.1007/s00438-018-1421-x

29 

CDimauro, LNicoloso, MCellesi, NPPMacciotta, ECiani, BMoioli, et al. Selection of discriminant SNP markers for breed and geographic assignment of Italian sheep. Small Rumin Res. Elsevier; 2015;128: 2733. 10.1016/j.smallrumres.2015.05.001

30 

SMichailidou, GTTsangaris, ATzora, ISkoufos, GBanos, AArgiriou, et al. Analysis of genome-wide DNA arrays reveals the genomic population structure and diversity in autochthonous Greek goat breeds. PLoS One. Public Library of Science; 2019;14: e0226179. 10.1371/journal.pone.0226179

31 

CDimauro, MCellesi, RSteri, GGaspa, SSorbolini, AStella, et al. Use of the canonical discriminant analysis to select SNP markers for bovine breed assignment and traceability purposes. Anim Genet. John Wiley & Sons, Ltd; 2013;44: 377382. 10.1111/age.12021

32 

AYoung, TBoyle, TBrown. The population genetic consequences of habitat fragmentation for plants. Trends in Ecology and Evolution. Elsevier Ltd; 1996. pp. 413418. 10.1016/0169-5347(96)10045-8

33 

EGTóth, FTremblay, JMHousset, YBergeron, CCarcaillet. Geographic isolation and climatic variability contribute to genetic differentiation in fragmented populations of the long-lived subalpine conifer Pinus cembra L. in the western Alps. BMC Evol Biol. BioMed Central Ltd.; 2019;19: 117. 10.1186/s12862-018-1333-8