Lead Contact
Plasmids are important in microbial evolution and adaptation to new environments. Yet, carrying a plasmid can be costly, and long-term association of plasmids with their hosts is poorly understood. Here, we provide evidence that the Chlamydiae, a phylum of strictly host-associated intracellular bacteria, have coevolved with their plasmids since their last common ancestor. Current chlamydial plasmids are amalgamations of at least one ancestral plasmid and a bacteriophage. We show that the majority of plasmid genes are also found on chromosomes of extant chlamydiae. The most conserved plasmid gene families are predominantly vertically inherited, while accessory plasmid gene families show significantly increased mobility. We reconstructed the evolutionary history of plasmid gene content of an entire bacterial phylum over a period of around one billion years. Frequent horizontal gene transfer and chromosomal integration events illustrate the pronounced impact of coevolution with these extrachromosomal elements on bacterial genome dynamics in host-dependent microbes.


•
Chlamydial plasmids coevolved with their bacterial hosts over a billion years
•Recombination with extrachromosomal elements and viruses shaped plasmid gene content
•Plasmid-mediated chromosomal gene mobilization and transfer drove genome evolution
•Plasmids contributed to adaptation of chlamydiae to diverse eukaryotic hosts
Köstlbacher et al. illustrate how plasmids of intracellular bacteria in the phylum Chlamydiae have coevolved with their hosts over a billion years. By mobilizing chromosomal genes, plasmids contributed to host adaptation and might have mitigated the degenerative effects of Muller’s ratchet in this group of intracellular pathogens and symbionts.
Plasmids are extrachromosomal genetic elements encoding a wide range of genes that allow organisms from all domains of life to adapt to different stresses or niches.1 Ranging in size from below 1 kb to more than 2.5 Mb, the effect of plasmids on their hosts is often poorly understood, as most plasmids have not been fully characterized.2 Among bacteria, plasmids spread genetic information within and between populations, strains, species, and even more distantly related microbes.3 This mechanism of horizontal gene transfer (HGT) is not only an important driver of the evolution of natural microbial populations, but plasmids are also essential tools in diverse applications in genetics and biotechnology, and they have important implications in public health. Major human pathogens, such as enterohemorrhagic E. coli (EHEC), emerge through plasmid acquisition.4 Importantly, plasmid-mediated transfer of antibiotic resistance is a key factor in the spread of antibiotic resistance and the increase in multi-resistant bacterial pathogens.5
Acquisition of a plasmid implies gain of genetic potential, yet there are usually negative side effects. A number of plasmids encode toxin-antitoxin (TA) modules—genetic elements that encode a protein capable of inhibiting cell growth and an antitoxin that counteracts the toxin.6 Loss of such a plasmid, therefore, can be detrimental to the host. Even in the absence of TA systems, production of plasmid proteins (as well as maintenance and repair of plasmid DNA requires host resources) occupies cellular machinery such as ribosomes and disrupts the cellular environment.7, 8, 9 Newly acquired plasmids are thus lost quickly without selection for plasmid-encoded genes.10 In addition, lateral transfer of plasmids and compensatory mutations that reduce the costs for plasmid maintenance are important factors in plasmid persistence.10, 11, 12 During longer phases of host-plasmid coexistence, plasmids can coevolve with their hosts,13, 14, 15, 16, 17 and plasmid-mediated HGT has been proposed to represent a coevolutionary process.18 Plasmids can be altered through coresiding mobile genetic elements like integrative conjugative elements (ICEs), transposons, phages, or even other plasmids.19,20 Longer histories of host-plasmid coexistence are often found in strictly intracellular bacteria. The potentially longest described case is found in Buchnera species, primary endosymbionts of aphids, which seem to be coevolving with their plasmids for up to 70 My.21 Around 25 My years of association with their 8 kb plasmids is found in Riesa species, endosymbionts of blood-sucking lice parasitizing primates.22
To investigate the association of bacteria with plasmids over an extended evolutionary time period, we chose the Chlamydiae, a phylum of obligate intracellular pathogens and symbionts that have engaged in a host-associated lifestyle around a billion years ago.23, 24, 25 A strictly host dependent lifestyle has severe evolutionary consequences for bacterial genomes. Due to small population sizes, genetic drift, and limited access to large gene pools, endosymbiont genomes accumulate deleterious mutations eventually leading to genome size reduction.26, 27, 28 These constraints make obligate intracellular bacteria an interesting subject to study genome and plasmid evolution.29 Most human and animal pathogens classified in the family Chlamydiaceae carry a conserved 7.5 kb plasmid with eight plasmid encoded proteins, referred to as plasmid glycoproteins Pgp1–8.30, 31, 32, 33 These low copy number plasmids34 represent an important virulence factor in the natural host.35, 36, 37, 38 Accumulating evidence indicates coevolution of chlamydial plasmids and chromosomes within the family Chlamydiaceae.39, 40, 41, 42 HGT among Chlamydia trachomatis strains is common, and both intra- and inter-species HGT has been demonstrated experimentally,43, 44, 45 yet the role of plasmids therein is unclear. Intriguingly, all other chlamydial families with cultured representatives have members with plasmids up to 145.3 kb in size.46, 47, 48, 49, 50, 51, 52 Despite the heterogeneity in plasmid size and gene content, based on the presence of conserved plasmid genes it has been proposed that all chlamydial plasmids originated from a single plasmid in the last common ancestor (LCA) of the phylum Chlamydiae.50
In this study, we aimed to recapitulate more than a billion years of plasmid gene content evolution in the bacterial phylum Chlamydiae. We demonstrate that a core set of plasmid genes is conserved, despite the plasticity of plasmid size across the phylum. We investigated the shared ancestry and putative origin of key core plasmid genes by integrating virus and plasmid sequence databases in our evolutionary analysis. We present evidence for an ancient acquisition of the chlamydial plasmid and find that the evolutionary trajectory of plasmid genes is characterized by frequent chromosomal integration and HGT. We propose that vertically inherited plasmids have been important partners in genome evolution in these strictly intracellular bacteria, facilitating genome evolution in the face of small population sizes and genetic drift.
The monophyly of the phylum chlamydiae and its major families is well supported by phylogenomic analysis in previous studies24,50,53 and confirmed with our comprehensive dataset comprising high-quality genomes of plasmid-containing and plasmid-less chlamydiae (Figure S1; Data S1). First, we compared the chlamydial plasmids in our dataset to known plasmids from other bacterial phyla and found that their size of 7.5–145 kb falls into the range of described bacterial plasmids (Figure S2A). The GC content is with 28% to 44% slightly lower than in most other phyla (Figure S2B), and on average 4.8% lower than the GC content of the host chromosomes (Pearson’s correlation coefficient r = 0.603, p = 0.005; Figure S2C), a feature also seen in other host-associated bacteria.54,55 Importantly, statistical analysis shows that the trinucleotide composition of most chlamydial plasmids matches that of the respective chromosomes, indicating plasmid acquisition of the host genomic signature (Data S2A).56
We next performed de novo clustering of the 124,183 proteins encoded on chlamydial plasmids and chromosomes in our dataset into 22,565 gene families. The plasmid proteome comprising in total 733 proteins is represented in 302 chlamydial plasmid gene families, whose members are encoded on at least two plasmids, or on one plasmid and one chromosome (Data S2B). Surprisingly, this amounts to more than 30% of the gene content among all chlamydial plasmids (Figure 1). The plasmids of the Chlamydiaceae and of the fish pathogen Clavichlamydia salmonicola are all smaller than 9 kb in size but are comprised of 100% conserved chlamydial plasmid genes as observed previously.30 The large plasmids (>20 kb) include between 42% (Protochlamydia naegleriophila) and 89% (Criblamydia sequanensis) plasmid genes. Despite the variability in size, chlamydial plasmids are thus remarkably well conserved with respect to their gene content across all of the seven chlamydial families analyzed.


Highly Conserved Gene Content of Chlamydial Plasmids
Chlamydial species tree relating chlamydial plasmids and conservation of plasmid genes. Circles depict chlamydial plasmids and include plasmid size in kilobases, GC content in percent, and the proportion of conserved plasmid-encoded genes. Genes present on other chlamydial plasmids and chromosomes are shown in red, genes present on other chlamydial plasmids only in orange, genes present on one plasmid only and on other chlamydial chromosomes in yellow, and genes present only on a single plasmid within the chlamydiae are gray. Bar indicates 0.2 substitutions per site. See Figure S1 for the full species tree. See also Figure S2 and Data S1 and S2.
Taken together, acquisition of the chromosome trinucleotide signature and the high proportion of genes shared among chlamydial plasmids and between plasmids and chromosomes provide first evidence for an extended period of coexistence and a shared evolutionary history of chlamydial plasmids with their bacterial hosts.
To understand better the evolutionary building blocks that formed the extant chlamydial plasmids, we focused on the most highly conserved plasmid gene families and asked whether it was possible to recover a common plasmid gene set. Consistent with previous observations, chlamydial plasmids lack a pronounced backbone, i.e., a larger set of genes present in all chlamydial plasmids.47 Nonetheless, there are common gene families between subsets of plasmids (Figures 2B and S2D). To investigate the relations between these gene families, we performed partial correlation network analysis. Briefly, we measured the degree of association between gene families based on their occurrence patterns on chlamydial plasmids. Of 151 gene families occurring on at least two plasmids, 92 were included in the network because they showed a statistically significant correlation (false discovery rate [FDR] corrected p ≤ 0.05) based on their presence/absence on diverse chlamydial plasmids (Figure 2A). Using an algorithm for the identification of densely connected regions in the correlation network, these conserved plasmid gene families clustered into three statistically significant subgraphs (with p ≤ 0.05). Based on their abundance and predicted functions, we refer to these subgraphs as (1) core group, (2) type IV secretion (T4SS) group, and (3) phage group, respectively (Figure 2A).


The Mosaic Gene Set of Chlamydial Plasmids
(A) Partial correlation network of plasmid gene families present on more than two chlamydial plasmids (n = 151, Data S4). The network represents the degree of association between gene families based on their occurrence patterns on chlamydial plasmids. Nodes represent gene families and edges represent the correlation coefficient. Only statistically significant correlations with an FDR corrected p ≤ 0.05 are shown. Three highly connected groups of gene families can be identified, a core group (green), a type IV secretion system (T4SS) group (yellow), and a phage group (violet). Gray nodes are outliers or overlap between two clusters. Labels indicate highly conserved plasmid gene families (present on ≥4 plasmids).
(B) Distribution of highly conserved plasmid gene families and their predicted function. Numbers in boxes represent the gene family copy number on plasmids. See also Figures S2 and S3 and Data S2.
The core group represents the largest and most conserved set of plasmid gene families, comprising 46 (15.2%) of all conserved plasmid gene families (Figure 2B; Data S2C). Many of these have characteristic plasmid functions, and five of seven gene families that make up the Chlamydiaceae plasmid (Figure 2B) belong to this group. This includes the helicase Pgp1 essential for plasmid maintenance in the Chlamydiaceae,57 the predicted plasmid partitioning protein ParA/Pgp5, the integrases Pgp7 and Pgp8, as well as Pgp2 and Pgp6, two proteins of unknown function, which are essential for plasmid maintenance.57,58 Some of these genes are known to modulate gene expression,58,59 and in C. trachomatis two highly expressed antisense sRNAs are encoded in pgp5 and pgp7/8.60,61 Other gene families in the core group function in stress response or are involved in plasmid persistence, such as an efflux transporter and a TA system (Figure 2B).
The T4SS group comprises a set of gene families associated with type IV secretion. The role of the chlamydial T4SS is still unclear, but it is monophyletic based on phylogenetic analysis of the outer membrane protein TraN50 (Figure S3A) and occurs on the plasmids of S. negevensis, P. naegleriophila, and R. massiliensis. The T4SS is integrated into the genome of some members of the Parachlamydiaceae and Simkaniaceae (Figure S3B) and was suggested to originate from an Alphaproteobacteria donor.50
Finally, the phage group contains gene families almost exclusively present on the P. massiliensis and C. sequanensis plasmids, which encode among others a phage terminase (OG0004061), tail tip protein L (OG0004637), and RNA polymerase-associated protein Gp33 (OG0000297), indicating a putative phage origin for these gene families (Data S2C).
Overall, we identified a mosaic plasmid gene set consisting of a large core and two gene sets likely originating from other plasmids and prophages. One conceivable scenario would be that the core gene set is a remnant of an ancestral plasmid acquired by an early chlamydiae ancestor.
We thus next asked whether gene families in the plasmid core gene set indicate a common origin of chlamydial plasmids. To address this, we analyzed the phylogeny of the most well represented gene families, parA/pgp5 and pgp7/8, both of which have predicted functions typically associated with extrachromosomal elements (Figure 2B).
Homologs of parA/pgp5 are found on all chlamydial plasmids and all chromosomes (Data S3). This gene family encodes ATPases with cytoskeletal properties.62 ParA (or homologs like RepA, SopA) interacts with the DNA-binding protein ParB (RepB, SopB) and is integral for the partitioning of many low copy plasmids and phages.62, 63, 64 The system is also often encoded chromosomally in bacteria and can contribute to chromosome partitioning.64, 65, 66 Of note, chlamydial plasmids lack parB homologs, although parB is present on most chlamydial chromosomes.
The parA/pgp5 gene family containing chlamydial plasmid and chromosomal copies is large (n = 71) and comprises five eggNOG Clusters of Orthologous Groups (COG) (Data S3). Yet, the original Chlamydiaceae pgp5 and the highly conserved chromosomal copy of parA found on 38 chlamydial chromosomes all belong to a single eggNOG COG (ENOG4105C2U). Phylogenetic analysis shows that all chlamydial members of this COG are monophyletic, with plasmid Pgp5 and chromosomal ParA proteins representing sister groups (Figure 3A). This suggests that parA/pgp5 was present already in the last common chlamydial ancestor, underwent gene duplication, and was subsequently maintained on some plasmids and on all closed chlamydial genomes. The closest relatives of chlamydial parA/pgp5 are parA homologs found on plasmids of cyanobacteria and actinobacteria. This indicates that the ancestral chlamydial parA/pgp5 originated from a plasmid and was subsequently integrated in chlamydial chromosomes. The presence of additional yet more distantly related plasmid-encoded parA/pgp5 genes in some chlamydiae (in eggNOG ENOG4107QJE) suggests that the ancestral chlamydial parA/pgp5 has been replaced by a homolog from an unrelated plasmid in at least one lineage, the Parachlamydiaceae (Figure 3B; Data S3). This scenario is consistent with the presence of two plasmids with parA/pgp5 orthologs of different origin in R. massiliensis and earlier analysis.46


A Plasmid-Derived ParA/Pgp5 in the Chlamydial Ancestor and Viral Origin of Integrase Pgp7/8
(A) Phylogenetic analysis of chlamydial parA/pgp5 gene copies in EggNOG ENOG4105C2U and its plasmid representatives. Chlamydial plasmid and chromosomal clades are indicated and represent monophyletic sister groups.
(B) Phylogenetic analysis of the second chlamydial parA family in ENOG4107QJE and its plasmid representatives.
(C) Phylogenetic analysis of chlamydial pgp7/8 and its closest relatives of viral origin, the VOGDB VOG000016. Light blue indicates chlamydial branches, black other bacterial branches, red plasmid genes from the dereplicated RefSeq plasmid dataset, and purple viral genes. Maximum likelihood phylogenetic trees with best fit models (LG+C40+F, LG+C10+G+F, and LG+C60+G+F, respectively) with 1,000 ultrafast bootstraps are shown. Bootstrap support for monophyly of chlamydial clades in all trees is ≥95% and the SH-like approximate likelihood ratio is ≥80%.
Scale bars indicate one substitution per position. See also Data S3 and S4.
The second most conserved gene family on chlamydial plasmids is a putative integrase referred to here as Pgp7/8 (OG0000907, Figure 3B) due to the presence of two distinct copies on extant Chlamydiaceae plasmids. pgp7/8 is exclusively found on chlamydial plasmids and chromosomes and is notably absent from all other known prokaryotic genomes (EggNOG ENOG4106VZX). This led us to investigate a putative viral origin by performing homology searches of Pgp7/8 proteins against the Virus Orthologous Groups database (VOGDB, http://vogdb.org/, Data S4). Hidden Markov-model-based search places Pgp7/8 into a large viral orthologous group (VOG, VOG000016) with 652 members. Phylogenetic analysis of this dataset merged with all chlamydial integrases demonstrated that chlamydial Pgp7/8 is a monophyletic clade deeply branching among viral homologs (Figure 3C). The closest relatives include the putative integrases of Mycoplasma phage MAV1 (NP_047270.1) and a clade of Siphoviridae that infect diverse bacteria and archaea. This suggests that pgp7/8 was acquired once early in chlamydial evolution. Phages are known to have had a long-standing relationship with plasmids and can contribute to plasmid gene influx.67
Altogether, our phylogenetic analysis of the two most well-represented gene families on chlamydial plasmids suggests the presence of key plasmid genes in the last common chlamydial ancestor. The monophyly of the chlamydial partitioning protein ParA/Pgp5 indicates that this gene evolved independently on plasmids and chromosomes after an ancestral duplication event. The closest relatives are encoded on extrachromosomal genetic elements, pointing to an extrachromosomal origin of these genes.
A noticeable finding of our gene content analysis was that the majority of chlamydial plasmid gene families is also represented on chlamydial chromosomes (n = 255, 84.4%; Table S1). Inversely, the chromosomes of all known chlamydiae encode on average 6.4% plasmid gene families (31–204 genes, standard deviation [SD] ±1.34%; Figures 4, S4A, and S4B). This may be explained in two ways: either by integration of chromosomal genes into the plasmid or by integration of plasmid genes into the chromosome. The integration of plasmid genes into chlamydial chromosomes has been documented for a foreign tetC gene in the pathogen Chlamydia suis Tcr68 and for the T4SS in the plasmidless amoeba symbionts Protochlamydia amoebophila and Parachlamydia acanthamoebae.50 A high frequency of gene transfer between plasmids and chromosomes has also been observed in other bacteria69 and has been experimentally shown in artificial soil bacterial communities.70 This process, also referred to as gene externalization, represents an important driver of bacterial genome evolution.71 In addition, a number of plasmid genes are apparently being maintained both on chlamydial plasmids and chromosomes in the same organism (Figure 4). Such redundancy is thought to facilitate innovation through neo-functionalization.72 On the other hand, in small populations, as in the case of obligate endosymbionts, genetic redundancy can counteract Muller’s ratchet—the fixation of slightly deleterious mutations combined with the random loss of the fittest genotypes that may lead to extinction.73, 74, 75


High Mobility of Genes between Plasmids and Host Chromosomes
The outer ring shows representations of chlamydial genome sequences including 13 chromosomes and 12 plasmids. The inner ring illustrates plasmids only. Outer links connect plasmid genes with their chromosomal homologs in the respective host chromosome. Inner links connect plasmid genes to chromosomal homologs in other chlamydial species. All chlamydial chromosomes, including those of plasmidless representatives such as P. acanthamoebae and P. amoebophila, encode a high percentage of conserved plasmid gene families (6.4% on average). See also Figure S4 and Table S1.
How did the high frequency of gene flow between chlamydial plasmids and chromosomes affect the functional role of both? To this end, we compared all gene families with at least one plasmid encoded copy with respect to their predicted function in cellular pathways according to eggNOG functional categories. This analysis showed that the functional profile of the plasmids is diverse but markedly differs from that of the chromosomes (Figures S4C and S4D). Chlamydial plasmid gene families for which a function could be predicted are involved in diverse cellular processes including secretion, transport, energy production/conversion, and transcription. Notably, plasmids are lacking genes functioning in translation, ribosomal structure and biogenesis, and cell motility (Figures S4C and S4D; Data S2B). The largest fraction of plasmid genes was assigned to the category “replication, recombination, and repair,” which was significantly enriched in comparison to chromosomal genes (22% versus 8%; p = 6.38 × 10−16, one-tailed Fisher’s exact test; Figure S4C). The majority of these genes represent transposases, which are considered important factors in genome evolution and may represent high turnover genes on extrachromosomal elements.71
Taken together, our analysis documents a high frequency of gene transfer events between chlamydial plasmids and chromosomes, possibly facilitated by transposases, which are abundantly present on most chlamydial plasmids. Despite this, chlamydial plasmids have maintained a characteristic functional profile different from chlamydial chromosomes. The high level of gene flow dynamics and the presence of characteristic plasmid genes on nearly all chlamydial chromosomes further support a long-standing relationship between chlamydiae and their plasmids.
We next investigated the impact of gene transfer on the chlamydial plasmid during its prolonged association with its bacterial hosts. To this end, we calculated maximum likelihood phylogenetic trees for all chlamydial gene families and applied a gene tree-species tree reconciliation approach as implemented in ecceTERA.76 Briefly, to reduce gene tree uncertainty, ecceTERA reconciles samples of gene family trees with the species tree (Figure S1) and creates species tree aware gene trees.77 Based on these more accurate gene trees, gene duplication, transfer, and loss events are estimated using all parsimonious reconciliations (see STAR Methods).
We first compared two sets of gene families, those that are predominantly encoded on plasmids and those predominantly encoded on chromosomes. We determined the number of gene transfers per node in a gene tree for each gene family, referred to as the number of normalized transfers per gene family. We observed a significantly increased transfer rate for plasmid-encoded gene families in comparison to chromosomal gene families (median of 0.125 versus 0.066 normalized transfers per gene family; p = 2.9 × 10−8, unpaired Wilcoxon signed-rank test; Figure 5A). The apparent higher mobility of plasmid-encoded genes indicates a dynamic evolutionary history and suggests that chlamydial plasmids were important mediators of HGT during the evolution of chlamydial genomes. This analysis also revealed that chlamydial genomes were differently affected by inter-species gene transfer with respect to plasmid gene families (Figure 5B). The most striking set of transfers was observed between Parachlamydia massiliensis and Criblamydia sequanensis, with 29 transfer events including pgp7/8 (Figure S5) and parA/pgp5. As this constitutes more than 65% of all plasmid genes in these species, this likely indicates acquisition of a complete plasmid, as suggested above in our analysis of conserved plasmid-encoded genes (Figure 1). The direction of this inter-species plasmid transfer cannot be reliably inferred, but the better fit of the P. massiliensis plasmid to its host’s chromosomal signature in terms of GC content and trinucleotide signature—as opposed to C. sequanensis and its plasmid—suggests a fairly recent transfer from P. massiliensis to C. sequanensis (Figure S2C; Data S2A).


Increased Mobility of Plasmid Gene Families and Inter-family Transfer Events of Plasmid Genes
(A) Boxplot showing the number of normalized transfer events per gene family as inferred from gene tree-species tree reconciliations using 2,950 chromosomal and 141 plasmid gene families. The p value was calculated using the Wilcoxon signed-rank test. Outliers are not shown but are included in the statistical analysis.
(B) Transfer events of plasmid genes superimposed on a schematic chlamydial species tree collapsed at the family level. The transfer of T4SS-associated genes between the Simkaniaceae and the Parachlamydiaceae is indicated in orange. Core plasmid gene transfers between multiple families and a potential whole plasmid transfer from P. massiliensis to C. sequanensis are shown in green. The inferred transfer of a prophage is indicated in purple. See also Figure S5.
Two other notable sets of transfer events involve the T4SS-associated genes and the putative prophage, both previously identified as major building blocks of chlamydial plasmids (Figure 3). Gene tree-species tree reconciliation indicates that these gene sets were transferred between the LCAs of the Simkaniaceae and Parachlamydiaceae, and between the Parachlamydiaceae and Criblamydiaceae (Figure 5B).
Collectively, gene tree-species tree reconciliations revealed chlamydial plasmids as important facilitators of HGT. Plasmid-encoded gene families are more frequently transferred than chromosomal gene families, and there is evidence for interspecies transmission of complete plasmids and large functional units, such as the chlamydial T4SS. HGT is a major driver of microbial genome evolution, promoting the adaptation to novel environmental conditions.78 It is considered particularly important for strictly intracellular bacteria as it provides another means to escape Muller’s ratchet.73,74
Combining our comprehensive phylogenetic analysis and evidence from gene-tree species-tree reconciliation results in an evolutionary scenario for a common origin of extant chlamydial plasmids and a shared evolutionary history with their bacterial hosts. We base this scenario on the findings of (1) the acquisition of the host chromosome trinucleotide signature of chlamydial plasmids, (2) the presence of a set of co-occurring core chlamydial plasmid genes, (3) the monophyly of the key chlamydial plasmid genes pgp5/parA and pgp7/8 and their inferred extrachromosomal origin, (4) the high prevalence of chlamydial plasmid genes on chromosomes, and (5) the predominantly vertical inheritance of pgp7/8. We derived the gene content of putative ancestral plasmids using the gene tree-species tree reconciliations of plasmid enriched gene families.
The reconstructed ancestral plasmid last common ancestor (plasmid LCA or pLCA) present in the LCA of all chlamydiae contained 11 plasmid gene families (Figure 6; Table S2), including parA/pgp5, the helicase pgp1, and pgp6, the two latter of which are essential for the maintenance of extant Chlamydiaceae plasmids.57 Molecular dating of the chlamydiae LCA estimated an age of 700 My to one billion years,23,24 which likely places the chlamydiae pLCA at approximately the same time.


A Scenario for the Evolutionary History of Chlamydial Plasmids
Reconstructed ancestral plasmids (pLCAs; middle panel) are shown as rings along a schematic timeline of evolutionary events over an estimated period of 1 billion years (left). Ring segments indicate plasmid-encoded genes colored by functional groups (green, chlamydial core plasmid; yellow, T4SS genes; purple, phage genes). The numbers in the rings refer to the number of gene families present on the ancestral plasmids. Major events include 1: acquisition of the original Chlamydiae pLCA by the last common chlamydial ancestor from an unknown donor; 2: acquisition of the viral integrase pgp7/8; 3: acquisition of the transcriptional regulator pgp4 in the Chlamydiaceae/Clavichlamydiaceae pLCA; 4: acquisition of the T4SS by the Parachlamydiaceae ancestor from an Alphaproteobacteria ancestor; 5: transfer of the T4SS and pgp7/8 from the Parachlamydiaceae pLCA to the Simkaniaceae pLCA; 6: acquisition of a second plasmid in the Parachlamydiaceae LCA that encodes a TA system; 7: inter- and intra-family plasmid gene flow, such as plasmid transfer from P. massiliensis to C. sequanensis or plasmid integration in P. amoebophila. See also Figure S6 and Table S2.
Next, the pLCA of the Parachlamydiales-Chlamydiales ancestor presumably acquired an integrase from a phage donor related to the Siphoviridae, which subsequently underwent gene duplication (Figure 3). Most chlamydial plasmids retained only one copy, while both genes diverged to give rise to pgp7 and pgp8 in current Chlamydiaceae and Clavichlamydia plasmids (Figure 6; Figure S5). Consistent with this, the almost entirely vertical transmission of this gene family has been observed earlier for C. trachomatis strains42 and the genus Chlamydia in general.40
A decisive event occurred during the divergence of the ancestor of the Parachlamydiaceae, Criblamydiaceae, and Waddliaceae, and the ancestor of the Chlamydiaceae and Clavichlamydiaceae (Figure 1). The ancestral plasmid of the latter gained pgp4, which today is a key plasmid specific transcription factor of virulence genes for in vivo pathogenicity in the Chlamydiaceae.79 This event likely contributed to niche differentiation and the infection of higher animals including humans, as loss of the plasmid has, in some Chlamydia species, been shown to lead to attenuated infection.80,81 At this point, the plasmid already included seven of the eight plasmid gene families encoded in the extant Chlamydiaceae plasmid (Figure 6).
In the Parachlamydiaceae/Criblamydiaceae/Waddliaceae lineage, which includes a large number of diverse species that live as symbionts of amoeba in the environment,82,83 the ancestral plasmid underwent major expansions through several independent gene acquisitions and almost doubled in gene content (from 28 to 55 gene families). A T4SS was acquired from an Alphaproteobacteria donor50 and integrated into the plasmid (Figure 6; Figure S4). Intriguingly, the T4SS does not appear to originate from a conjugative plasmid but is likely an ICE84 as the closest relatives are extant Rickettsia ICEs.85 In close temporal proximity, another plasmid entered the Parachlamydiaceae ancestor, bringing a set of Parachlamydiaceae plasmid specific genes, including a TA system (Figure S6). Together this gene set forms the backbone for extant plasmids in members of the Parachlamydiaceae. The Parachlamydiaceae T4SS was subsequently acquired together with a number of accessory genes by the plasmid in the Simkaniaceae ancestor (Figures 5 and 6) and (partially) integrated in the chromosome in some Parachlamydiaceae members. Throughout this series of evolutionary events and during the long coevolution of chlamydiae with their plasmids, chromosomal integration of plasmid genes and mobilization of chromosomal genes contributed to shaping the chlamydial genome (Figure 5).
In summary, plasmids are well known for their contribution to the adaptation and evolution of microbes. Yet, coevolution of plasmids with their hosts has mostly been studied using experimental evolution approaches14, 15, 16, 17 or evolutionary genomics for closely related microorganisms.86, 87, 88 Plasmids depend on host resources for maintenance and evolve toward a reduction of metabolic costs and/or an increased persistence.12,89,90 Additionally, adaptation on the host side can, given selective pressure for a period of time or mitigating environmental conditions, reduce the cost of plasmid carriage.10,16,17,91,92 Here, we provided evidence that, in the phylum Chlamydiae, this has led to an unmatched intimate evolutionary relationship, in which an ancient acquisition of an ancestral plasmid and subsequent gene gains and losses gave rise to a collection of extant plasmids in a highly diverse range of bacterial hosts. These plasmids are crucial for the virulence of modern human and animal pathogens79,93, 94, 95 and widespread among their environmental representatives. Chlamydial plasmids have promoted inter-species gene transfer, which in concert with the ancient and strictly intracellular lifestyle of chlamydiae has likely contributed to the maintenance and persistence of the plasmid over extended evolutionary time periods.96 Plasmids may have provided a means for this group of strictly intracellular microbes to ameliorate the degenerative effects of Muller’s ratchet by promoting HGT.97 To the best of our knowledge, we documented the presumably oldest known system of host-plasmid coexistence and coevolution, with a shared history of around one billion years.23,24

Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Matthias Horn (matthias.horn@univie.ac.at).
This study did not generate new unique reagents.
Alignment files, tree files, and the python script are available at zenodo (https://zenodo.org/record/3859863).
To assemble a comprehensive genome sequence dataset, we collected 26 publicly available Chlamydiae genomes from GenBank and 28 genomes of members of the PVC superphylum from the NCBI RefSeq database (Data S1 and Figure S1)116,117. All genomes were checked for completeness and contamination with checkM v1.0.798 using the “taxonomy_wf” setting and the marker gene set for bacteria. We included only genomes with greater than 85% completeness and lower than 5% contamination.
Genomic signatures of chlamydial plasmids and chromosomes were calculated as described in56,118. Briefly, we cut chromosomal sequences into non-overlapping 10,000 bp segments and calculated the occurrence of trinucleotides on both strands with the ‘seqinr’ package100 in R 3.5.199. We then calculated δ-distance and Mahalanobis distance for plasmid sequences against the mean chromosome signature. We calculated the probability of the distance of the plasmid signature to the mean chromosomal signature to be smaller than that of the chromosomal segments, here referred to as P (δ) or P (Mahalanobis)). We calculated a median probability of 0.65 (P(Mahalanobis), IQR 0.27- 0.82; Data S2A) and set a P(Mahalanobis) cutoff of 0.6 for defining highly similar plasmid and chromosomal pairs as proposed by56.
To be able to assemble comprehensive datasets for phylogenetic analysis, which includes all relevant plasmid homologs we first generated a dereplicated RefSeq plasmid dataset. All 13,200 plasmids present in NCBI RefSeq116 (July 2018, ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plasmid/) were clustered with Drep v1.4.3101 at a 90% ANI cutoff with primary clustering resulting in 4,736 representative plasmids. We then extracted the associated proteome of representative plasmids to generate a query database for plasmid-associated protein sequences.
We mapped all proteins of our genome sequence dataset to eggNOG 4.5.1102 to receive Clusters of Orthologous Groups (COG) classifications. We used eggNOG-mapper v1.0.1103 with the bacteria optimized database using the “–database bact” option and default settings. For chlamydial plasmid encoded genes of interest with COG assignments we used the eggNOG provided HMM (hidden markov model) to screen the dereplicated RefSeq plasmid proteome for homologs. Using the hmmsearch program of the HMMER suite v3.1b2104 with an e-value cutoff of 10−1 we first identified potential homologs which we then assigned to COGs with eggNOG-mapper as described above.
To be able to include homologs from virus genomes in our analysis, we downloaded all virus orthologous groups (VOGs) from VOGDB v72 (http://vogdb.org/). Using the hmmpress program of HMMER suite v3.1b2104 we created a HMM database of all VOG HMMs. We searched plasmid encoded genes with the hmmsearch program with an e-value cutoff of 10−5 and selected the hits with the highest bitscore to assign VOGs for each gene.
To infer gene relationships also for genes lacking representatives in public databases we performed de novo clustering of all proteins in our genome dataset. Protein sequences were aligned using the “blastp” program (BLAST suite v2.5.0+106) to compute sequence similarity scores between sequences with an expectation value cutoff of 10−3. Using OrthoFinder 2.0107 we clustered the proteins into orthogroups (OGs), referred to as gene families.
To study co-occurrence of the most conserved gene families, i.e., those that were present on at least two plasmids, we performed correlation network analysis. We included all chlamydial plasmids but only used one representative of the Chlamydiaceae (C. trachomatis A/HAR-13) due to the high redundancy of members of this family with respect to plasmid gene content. A partial correlation network of conserved plasmid gene families was inferred using R 3.5.199 with the GeneNet 1.2.13 package108 with default settings based on presence/absence patterns of 151 conserved plasmid gene families. Only statistically significant correlations with an FDR corrected p value ≤ 0.05 were retained. Gene families were clustered into groups in Cytoscape 3.7.0109 with the ClusterONE 1.0 plugin110 with default settings, except an overlap threshold of 10-3. Significant groups had a p value ≤ 0.05.
For a detailed phylogenetic analysis of datasets assembled by mapping chlamydial proteins to COGs and VOGs, protein sequences were either aligned with MAFFT 7.222111 using the “–localpair” and “–maxiterate 1000” parameter, or in the case of VOGs with the VOGDB-provided HMM. The ENOG4105C2U alignment was trimmed with Noisy v1.5.12112, the ENOG4107QJE and VOG000016 alignments were trimmed with trimAl “-gappyout” to reduce the gap rate113. Identical sequences were removed prior alignment. Maximum likelihood phylogenies were calculated with IQ-TREE 1.6.2114 under the empirical LG model119. We applied the same model testing regiment as proposed by Dharamshi et al.53 with the empirical mixture models C10 to C60120. Because of the large number of sequences in the ENOG4107QJE dataset (n = 1,738), mixture model testing was restricted to C10 only. Support values were inferred from 1000 ultrafast bootstrap replicates121 with the “-bnni” option for bootstrap tree optimization and from 1000 replicates of the (Shimodaira-Hasegawa) SH-like approximate likelihood ratio test122. Trees were visualized and edited using the Interactive Tree Of Life v4105.
Species tree reconstruction was performed with the entire genome sequence dataset (Data S1). 43 conserved marker genes were extracted and aligned in checkM v1.0.7 with the “tree” workflow98. Bayesian tree samples with five MCMC chains in parallel (n = 10,000 each) were inferred using the CAT+GTR model120 with 4 discrete gamma categories in PhyloBayesMPI 1.7a115. Convergence was assumed once the discrepancies in bipartition frequencies dropped below 0.1 and the effective sample sizes for continuous parameters were greater 100 (according to the bpcomp and tracecomp commands in PhyloBayes, respectively) after burnin (n = 2,500). Species tree was rooted according to Kamneva et al.24 at the base of the Planctomycetes.
We aligned all gene families (OGs) calculated with OrthoFinder using MAFFT 7.222111 using the “–localpair” and “–maxiterate 1000” parameter. The protein alignments were trimmed with Noisy112. For each family with more than three sequences (n = 5,184) we reconstructed unrooted phylogenies with IQ-TREE 1.6.2114 using the implemented ModelFinder123 to find the appropriate model. The best fit model in combination with posterior mean site frequencies to model site heterogeneity124 under the C20 model125 was used to calculate 1,000 ultra-fast bootstrap samples for the downstream amalgamation procedure (n = 4964). 220 gene families had four or more sequences in total, but less than 4 unique sequences. There the only unrooted topology was used. We then performed gene tree-species tree reconciliation with ecceTERA v1.2.476, a program that implements a generic parsimony reconciliation algorithm, which accounts for duplications, losses and transfers, as well as speciation, and can accurately estimate species-tree aware gene trees using amalgamation77. We used the undated species tree mode “dated=0” without transfer from unsampled lineages “compute.TD=false.” We calculated the average genome size flux126 between ancestors for all fixed combinations of HGT cost 1-10 and duplication cost 1-10. For the ten cost vectors with minimal flux we calculated the mean support values of the symmetric median reconciliations for all gene trees (as proposed in127). We proceeded with cost settings of HGT = 3 and duplication = 1 (highest average support) for 4,624 gene families. For 542 gene families we used one of the alternative cost settings from the ten cost vectors with minimal genome size flux, if they were better supported.
We used a custom python script to integrate over the computed gene family phylogenies. Briefly, we extracted the presence/absence information for all gene families and their evolutionary events from root to leaves of the species tree for the ecceTERA symmetric median reconciliations. We then summarized the reconstructed sets of gene families that were present in chlamydial LCAs and tracked speciation, duplication, and loss events, as well as horizontal transfers. To identify chlamydial gene families that are predominantly encoded on plasmids we analyzed the number of occurrences of each gene family on chlamydial chromosomes and plasmids (n = 3,091 with more than one chlamydial sequence), respectively. We used hypergeometric tests in the R base package phyper128 with “lower.tail=T” to identify gene families that are significantly enriched on plasmids with a “BH”129 corrected p value ≤ 0.05 using the R base package “p.adjust.” pLCAs were then reconstructed based on these plasmid enriched gene families present in chlamydial LCAs. We calculated normalized gene transfers per gene family by dividing transfer events inferred by ecceTERA by the number of chlamydial branches in the gene tree (number of branches: 2 x (number of leafs - 1)). We then used a two-sided Wilcoxon signed rank test using the R base function “wilcox.test” to test for statistical significance.
All statistical tests and data analysis were performed in R version 3.5.199 and are described in the method details.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
83
84
85
86
87
88
89
90
92
93
94
95
96
97
98
99
100
101
102
103
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
We want to thank Masaki Shintani for advice and discussions concerning plasmid biology and Craig Herbold for advice and discussions about gene tree-species tree reconstructions. The Life Science Compute Cluster (LiSC; http://cube.univie.ac.at/lisc) was used for computational analysis. This project was supported by the European Research Council ERC (EVOCHLAMY, grant no. 281633 to M.H.), the
S.K., A.C., D.D., and M.H. conceptualized the study. S.K. and A.C. performed comparative genomic analysis. S.K. performed phylogenetic analyses and gene tree-species tree reconciliation analyses. S.K., A.C., T.H., and M.H. interpreted the results. All authors wrote and edited the manuscript.
The authors declare no competing interests.