Open Access
Research Article
Rickettsia Phylogenomics: Unwinding the Intricacies of Obligate Intracellular Life
1 Virginia Bioinformatics Institute at Virginia Tech, Blacksburg, Vigrinia, United States of America, 2 Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
Abstract
Background
Completed genome sequences are rapidly increasing for Rickettsia, obligate intracellular α-proteobacteria responsible for various human diseases, including epidemic typhus and Rocky Mountain spotted fever. In light of phylogeny, the establishment of orthologous groups (OGs) of open reading frames (ORFs) will distinguish the core rickettsial genes and other group specific genes (class 1 OGs or C1OGs) from those distributed indiscriminately throughout the rickettsial tree (class 2 OG or C2OGs).
Methodology/Principal Findings
We present 1823 representative (no gene duplications) and 259 non-representative (at least one gene duplication) rickettsial OGs. While the highly reductive (~1.2 MB) Rickettsia genomes range in predicted ORFs from 872 to 1512, a core of 752 OGs was identified, depicting the essential Rickettsia genes. Unsurprisingly, this core lacks many metabolic genes, reflecting the dependence on host resources for growth and survival. Additionally, we bolster our recent reclassification of Rickettsia by identifying OGs that define the AG (ancestral group), TG (typhus group), TRG (transitional group), and SFG (spotted fever group) rickettsiae. OGs for insect-associated species, tick-associated species and species that harbor plasmids were also predicted. Through superimposition of all OGs over robust phylogeny estimation, we discern between C1OGs and C2OGs, the latter depicting genes either decaying from the conserved C1OGs or acquired laterally. Finally, scrutiny of non-representative OGs revealed high levels of split genes versus gene duplications, with both phenomena confounding gene orthology assignment. Interestingly, non-representative OGs, as well as OGs comprised of several gene families typically involved in microbial pathogenicity and/or the acquisition of virulence factors, fall predominantly within C2OG distributions.
Conclusion/Significance
Collectively, we determined the relative conservation and distribution of 14354 predicted ORFs from 10 rickettsial genomes across robust phylogeny estimation. The data, available at PATRIC (PathoSystems Resource Integration Center), provide novel information for unwinding the intricacies associated with Rickettsia pathogenesis, expanding the range of potential diagnostic, vaccine and therapeutic targets.
Citation: Gillespie JJ, Williams K, Shukla M, Snyder EE, Nordberg EK, et al. (2008) Rickettsia Phylogenomics: Unwinding the Intricacies of Obligate Intracellular Life. PLoS ONE 3(4): e2018. doi:10.1371/journal.pone.0002018
Editor: Adam J. Ratner, Columbia University, United States of America
Received: February 6, 2008; Accepted: March 7, 2008; Published: April 16, 2008
Copyright: © 2008 Gillespie et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work is funded through NIAID contract HHSN266200400035C to BSS and NIH grants AI59118 and AI17828 to AFA.
Competing interests: The authors have declared that no competing interests exist.
* To whom correspondence should be addressed. E-mail: pvittata@hotmail.com
Introduction
Rickettsiae are a group of organisms belonging to the class Alphaproteobacteria, a large and metabolically diverse group of gram-negative bacteria [1]–[3]. Within Alphaproteobacteria, the order Rickettsiales comprises three families: Holosporaceae, Anaplasmataceae and Rickettsiaceae [4], of which Rickettsia spp. are grouped in the latter, along with the monotypic genus Orientia, the scrub typhus agent [5]. Robust phylogenetic analysis further suggests that the abundant free-living marine bacterioplankton Pelagibacter ubique and mitochondria are early-branching groups of the order [6]. Species in the genus Rickettsia are obligate intracellular symbionts of plants [7], amoebae [8], [9], arthropods [e.g., 10]–[13], annelids [14], vertebrates [15] and likely many other organisms [16]. Most Rickettsia-containing vertebrates are secondary hosts that acquired these bacteria via blood-feeding arthropods or the transdermal inoculation or inhalation of the feces of infected arthropods. Rickettsia spp. are often parasitic in the secondary vertebrate host [e.g., 17], and their pathogenicity to some extent has been well studied. In particular, human rickettsial infections are known to cause many diseases, including epidemic typhus (R. prowazekii), murine typhus (R. typhi), murine typhus-like (R. felis), rickettsial pox (R. akari), Rocky Mountain spotted fever (R. rickettsii), Boutonneuse fever (R. conorii), and North Asian tick typhus (R. sibirica). These virulent species of rickettsiae are of great interest both as emerging infectious diseases [18] and for their potential deployment as bioterrorism agents [19], [20].
Due to both small genome size and medical importance, ten genome sequences from Rickettsia spp. have been published and annotated in the last decade [9], [21]–[27], providing a foundation to study the evolutionary history of these lineages through comparative genomics. Recently, Gillespie et al. [28] proposed a revision to the long-standing classification of Rickettsia by erecting the transitional group (TRG) as a distinct lineage that shares immediate ancestry with the members of the spotted fever group (SFG) rickettsiae. Coupled with the typhus group (TG) and ancestral group (AG) rickettsiae, these four rickettsial lineages comprising 10 sequenced genomes present an opportunity to create a database that encompasses the distribution of the predicted open reading frames (ORFs) across all ten annotated genomes (Figure 1).
Figure 1. Venn diagram depicting 15 intersections for the four rickettsial groups.
Classification scheme based on molecular phylogeny estimation [28], the topology of which is shown in the lower left; AG = ancestral group, TG = typhus group, TRG = transitional group, SFG = spotted fever group. Genome codes are as follows: Br = R. bellii str. RML369-C, Bo = R. bellii str. OSU 85 389, Ca = R. canadensis str. McKiel, Pr = R. prowazekii str. Madrid E, Ty = R. typhi str. Wilmington, Ak = R. akari str. Hartford, Fe = R. felis str. URRWXCal2, Ri = R. rickettsii str. Sheila Smith CWPP, Co = R. conorii str. Malish 7, and Si = R. sibirica str. 246. Arthropod hosts are illustrated for each genome, and strains known to harbor plasmids are depicted.
doi:10.1371/journal.pone.0002018.g001Establishing orthology across multiple genomes serves not only to identify genes with shared evolutionarily histories, but also facilitates genome annotation [29], [30], and significant attention has focused on algorithms for creating orthologous groups (OGs). Recent work has centered on the following four aspects: i) overall improvement of OG assignment in the face of paralogy, ii) building tools for the cross-querying of taxon-specific databases, iii) creating databases that house specific gene or protein profiles for facilitating the identification of orthologs in novel sequences, and iv) the inclusion of phylogeny estimation into the processes of assigning orthology and detecting paralogy.
At the PathoSystems Resource Integration Center (PATRIC) [31], OGs have been preliminarily established for several groups of organisms, including Rickettsia spp. The advantage of a Rickettsia-specific database lies not only in the ability to query exclusively against the 10 genomes currently annotated in our system, but also to evaluate the results of several algorithmic approaches that create OGs. Furthermore, PATRIC offers continued updates to the annotation of rickettsial genes and proteins, and provides multiple sequence alignments as well as phylogenetic trees, when applicable, for each OG consisting of two to ten rickettsial taxa. The database will continually evolve with the addition of newly sequenced rickettsial genomes, with existing OG assignments driving the curation process of raw genome data.
In the present study, we report the rickettsial OGs (RiOGs) in conjunction with a highly robust phylogeny of the core rickettsial genes, providing an evolutionary framework for interpreting the genomic characteristics of the four main lineages of Rickettsia. These data highlight the genetic anomalies previously characterized for this genus, such as extremely reduced genomes and the high presence of putative pseudogenes, and also reveal novel characteristics including the lack of group-specific virulence factors and high occurrence of lateral transfer between groups that harbor plasmids (AG and TRG rickettsiae). Information on the conserved core genes, as well as those that may be involved in specific functions that define monophyletic groups, host associations, and plasmid-related behavior, will be valuable resources for future laboratory work (e.g., development of vaccines, diagnostics and therapeutics) as well as further evolutionary studies of this intriguing obligate intracellular bacterial group.
Results and Discussion
Synteny and Phylogeny of Rickettsia Genomes
Whole genome alignments for the ten analyzed Rickettsia taxa reveal highly conserved colinearity in six of the seven derived species (sans R. bellii and R. canadensis) with minimal gene rearrangements, most of which occur near the predicted origin of replication termination (Figure 2). However, the R. felis genome contains several long-range symmetrical inversions in the central region of the alignment that are not found in other taxa. Removal of R. felis from the alignment illustrates the highly conserved synteny across the derived rickettsia taxa (Figure S1-A). Furthermore, switching the positions of R. akari and R. felis in the alignment (Figure S1-B) demonstrates that these central inversions in R. felis, as well as a large genome size, are autapomorphic (uniquely derived) traits within derived rickettsiae. Among the three AG rickettsiae, R. canadensis (formerly R. canada) is more colinear with the derived taxa than it is to either R. bellii strain. Like R. felis, R. canadensis contains several autapomorphic symmetrical inversions in the central region of the alignment, yet they are smaller than the long-range inversions found in R. felis. As previously reported [32], R. bellii str. RML369-C shares little colinearity with other rickettsial genomes, and our analysis of both R. bellii genomes is in agreement with this observation. Despite several long and short range inversions between the R. bellii str. RML369-C and R. bellii str. OSU 85-389 genomes, few gene positions are shared with R. bellii and R. canadensis or the derived taxa (Figure 2), and switching the positions of the R. bellii strains in the alignment does not result in more conserved synteny between either strain and the derived taxa (Figure S1-C, D).
Figure 2. Alignment of 10 rickettsial genomes.
Taxa are in the same position as in estimated trees in Figure 3, with taxon abbreviations explained in the Figure 1 legend. Alignment created using Mauve [189] after reindexing the R. sibirica genome (see text for details).
doi:10.1371/journal.pone.0002018.g002Phylogenetic analyses implementing both maximum likelihood and parsimony of the 731 representative core rickettsial proteins (discussed below) resulted in robust estimates for these 10 taxa (Figure 3). The estimated tree topologies are identical in branching pattern and are congruent with the tree from our previous analysis of 716 fewer genes [28], suggesting that ten or more concatenated (and well-behaved, with high signal to noise ratio) genes are sufficient for obtaining a robust phylogenetic estimate for these rickettsial taxa. Thus, our recent classification scheme for Rickettsia consisting of 4 major groups (AG rickettsiae: R. bellii str. RML369-C, R. bellii str. OSU 85 389, R. canadensis str. McKiel; TG rickettsiae: R. prowazekii str. Madrid E, R. typhi str. Wilmington; TRG rickettsiae: R. akari str. Hartford, R. felis str. URRWXCal2; SFG rickettsiae: R. rickettsii str. Sheila Smith CWPP, R. conorii str. Malish 7, R. sibirica str. 246) is substantiated with a phylogenomic approach. In what follows, we use this evolutionary framework to analyze the distribution and relative conservation of all predicted genes for these ten rickettsial genomes.
Figure 3. Estimated phylogenies of ten rickettsial taxa based on 731 representative core proteins.
(A) Tree from Bayesian analysis. Three MCMC chains were primed with a neighbor-joining tree and run independently for 25000 generations in model-jumping mode. Burn-in was attained by 2500 generations for all chains, and a single tree topology with exclusive use of the Jones substitution model was observed in post burn-in data. The consensus tree shown here thus has 100% support for every branch. Branch support is from the distribution of posterior probabilities from all trees minus the burn-in. (B) Tree from exhaustive search using parsimony. Branch support is from one million bootstrap replicates.
doi:10.1371/journal.pone.0002018.g003Predicted OGs: Conservation and Representation
In the analysis of the rapidly growing list of rickettsial genomes we determined that OrthoMCL, a program that applies the Markov clustering algorithm of Van Dongen [33] to resolve the many-to-many orthologous relationships present within cross genome comparisons [34], outperformed more traditional approaches to establishing OGs, such as bidirectional best BLAST hits with and without cliques. Thus, we show here the results generated by OrthoMCL only, which grouped 12887 ORFs into 2082 total OGs (Table 1). The bulk (88%) of these OGs are representative (Figure 4A), meaning they include only one CDS per strain, thus ranging in membership from 2–10 sequences. The remaining 12% of the OGs are non-representative (Figure 4B) and include multiple predicted ORFs from at least one member. Categorization of the OGs into two classes based on distribution across the rickettsial tree and other attributes, such as presence of plasmids and common arthropod hosts (Figure 4C–D), reveals that 69% of the OGs are comprised of single rickettsial groups (e.g., AG, TG, TRG, and SFG), shared rickettsial groups (subgeneric), plasmid-harboring genomes, and genomes with common arthropod hosts (Table 1). These class 1 OGs (C1OGs) contain 76% of the predicted ORFs grouped into OGs by OrthoMCL, suggesting that our criteria for distinguishing biologically interesting protein families based empirically on robust phylogeny estimation, presence of extra-chromosomal DNA and shared arthropod hosts is valid. The remaining ORFs grouped into class 2 OGs (C2OGs) depict gene families drifting or sporadically lost from the core genetic repertoire of the rickettsial ancestor [32] or genes acquired laterally (Figure S2). Interestingly, while the majority (71%) of representative OGs qualify as C1OGs, the non-representative OGs are distributed within C1OGs and C2OGs in near equal frequency (Table 1), suggesting minimal conservation for gene duplications and laterally acquired genes in these rickettsial genomes.
Figure 4. Illustration of representative and non-representative OGs and their categorization into Class 1 and Class 2 OGs.
Taxon abbreviations are explained in the Figure 1 legend. Dark circles depict gene presence, while open circles depict gene absence. (A) Representative OGs: orthologous groups with only one ORF per included genome. Our analysis includes ten rickettsial genomes, thus representative OGs only include from 2–10 ORFs. Four examples are shown. (B) Non-representative OGs: orthologous groups with multiple ORFs from at least one included genome, comprised of either recent (orthologs) or distant (paralogs) gene duplications (dupl). False singleton OGs are comprised of only one taxon, but with multiple ORFs from that taxon (example on right). Four examples are shown. (C) Class 1 OGs (C1OGs): orthologous groups comprising single rickettsial groups (e.g., AG, TG, TRG, and SFG), shared rickettsial groups (subgeneric), plasmid-harboring genomes, and genomes with common arthropod hosts. Two representative (left) and two non-representative (right) C1OGs are shown. (D) Class 2 OGs (C2OGs): orthologous groups with patchy distribution across the rickettsial tree, depicting gene losses and/or genes acquired laterally. Two representative and two non-representative C2OGs are shown.
doi:10.1371/journal.pone.0002018.g004Table 1. Distribution of representative and non-representative OGs predicted across 14354 ORFs from ten rickettsial genomes, and their categorization into Class 1 and Class 2 OGs.1
doi:10.1371/journal.pone.0002018.t001The RiOGs range in membership from two to 31 ORFs, with few (<3%) OGs exceeding more than 10 ORFs (Table 2). Representative C1OGs comprise a substantial portion (64%) of the OGs with membership of 10 or fewer ORFs. Regarding the OGs with more than 10 members, a range from 4% (R. prowazekii) to 32% (R. conorii) illustrates the frequencies at which a particular rickettsial genome contributes to non-representation. As expected due to their smaller genome sizes and few gene duplications [21], [25], TG rickettsiae make little contribution (avg. 5%) to larger non-representative OGs as compared to AG (avg. 19%), TRG (avg. 17%) and SFG (avg. 31%) rickettsiae (Table 2). Thus, these three latter groups have genomes more tolerant of multicopy genes, particularly those resulting from transposases and other insertion sequences, which act to produce elevated levels of paralogous genes. For instance, analysis of the distribution of RiOGs containing genes associated with mobile DNA and/or horizontal gene transfer (HGT), such as genes coding for proteins with ankyrin (ANK) and tetratricopeptide repeat (TPR) motifs, proteins with rickettsial palindromic elements (RPE), proteins associated with transposable elements (TNP), proteins of toxin-antitoxin modules (TA), and phage related elements, revealed that they are nearly non-existent in TG rickettsial genomes (Table 3). The remaining three lineages, all purportedly containing some species that harbor plasmids, have elevated levels of most of these gene groups compared to TG rickettsiae. Interestingly, nearly half (47%) of the C2OGs are comprised of these six gene groups, while only a small portion of the C1OGs (5%) and singletons (4%) contain them (Table 3). Given the probable lateral inheritance of many of these genes, either as facilitators or products of HGT, it is evident that they are less conserved and of less importance to overall rickettsial fitness and survival. However, their contribution to species- and strain-specific pathogenicity cannot be overlooked. Interestingly, our observation that these more promiscuous gene families tend to occur predominantly within C2OGs is congruent with a recent study demonstrating that barriers to bacterial HGT are more stringent for single copy genes [35].
Table 2. Breakdown of membership (no. ORFs) across 2082 rickettsial OGs.
doi:10.1371/journal.pone.0002018.t002Table 3. Distribution across 10 rickettsial genomes of OGs and singletons containing proteins with ankyrin (ANK) and tetratricopeptide repeat (TPR) motifs, proteins with rickettsial palindromic elements (RPE), proteins associated with transposable elements (TPN), proteins of toxin-antitoxin modules (TA), and phage related proteins.
doi:10.1371/journal.pone.0002018.t003A comparison of the distributions of both representative and non-representative C1OGs and their associated singletons uncovers the high occurrence of singleton genes (53%) per representative C1OGs (Figure 5). While many singletons may be the product of gene overprediction (discussed below), some could possibly have important species- or strain-specific functions, such as host manipulation. “False singletons”, which depict non-representative OGs with all members from a single genome (Figure 4C), contribute less (17%) towards non-representation when identical genes from R. felis plasmids pRF and pRFδ are not considered (for speculation on the existence of pRFδ see Gillespie et al. [28]). Thus the biological causes of non-representation, such as HGT and gene duplication, tend to occur more within gene families common across multiple rickettsial genomes rather than in unique genes within individual genomes. This is congruent with our determination of the high occurrence within C2OGs of six gene families typically associated with mobile DNA and/or HGT (above).
Figure 5. Comparison of the distributions of 1300 representative and 145 non-representative class 1 OGs (C1OGs), 66 false singletons, and 1467 singleton ORFs.
Slices depict 16 generic and subgeneric groups, false singletons, singletons, plasmid associated groups, and two host-related groups, with outer circle colors depicted in schema. Taxon abbreviations, including subgeneric groups, are explained in the Figure 1 legend. (A) Distribution of 1300 representative C1OGs and 1467 singletons. (B) Distribution of 79 non-representative C1OGs and 66 false singletons.
doi:10.1371/journal.pone.0002018.g005The Nature of Non-Representation
The degree of non-representation recovered by OrthoMCL is not a surprise as Rickettsia genomes are notorious for being highly reductive [e.g., 36]–[38], having a high occurrence of split genes and pseudogenes [e.g., 22], [23], [32], [39], [40] and limited conservation in important host-recognition proteins such as rickettsial outer membrane protein A (rOmpA) and other cell surface antigens (Scas) [e.g., 41]–[57]. Coupled with this, some of the more recently sequenced genomes (namely both R. bellii strains and R. felis) are riddled with gene rearrangements and elevated levels of repetitive elements and transposases [9], [27], and the staggering degree of repetitive sequences and gene duplications in the recently sequenced genome of Orientia tsutsugamushi [58] suggest the old paradigms for genome reduction and synteny in Rickettsiaceae need reevaluation. Furthermore, as we recently predicted [28], new evidence is mounting for the presence of plasmids in several members of AG, TRG and SFG rickettsiae (reviewed in Baldridge et al. [59]), with some proteins having high similarity to counterparts encoded on rickettsial chromosomes [e.g., 28], [60]. All of these factors confound the accurate assignment of gene orthology across genomes, and it is important to view our results as algorithm-dependent, which further required manual scrutiny and adjustment.
Manual inspection of the 259 non-representative OGs via multiple sequence alignment of each specific case revealed the high occurrence of split genes versus true gene duplications (Table 4; Table S1). Including spurious duplications from the identical R. felis pRF and pRFδ plasmids, 387 problematic ORFs were eliminated or stitched together to create pseudogene ORFs, resulting in only 80 remaining non-representative OGs defined by true gene duplications. Notably, elimination of identical pRF and pRFδ plasmid genes created 33 additional R. felis singletons. After “repairing” OGs defined by split ORFs, four distributions contained the majority of C1OGs, illustrating the instances of gene decay from the core, -TG, TRG+SFG, and SFG distributions (Figure 6). Regarding the repaired OGs with a core distribution, nearly half of the split genes were from the R. bellii str. OSU 85-389 genome and include critical genes such as those encoding alanyl- and leucyl-tRNA synthetases and one of the five virB6 components of the type IV secretion system. OGs containing split genes with a -TG distribution include two proteins possibly involved in DNA transformation: a ComEC/Rec2-related protein and a putative DNA processing protein DprA, plus two phage related proteins and a TPR motif-containing protein. This illustrates that genes deleted from the TG genomes involved in conjugation or other methods of foreign DNA uptake are in the process of decaying from the remaining rickettsial genomes. Through the comparison of the proportion of split genes to gene duplications per rickettsial genome (Table 5), it is evident that split genes occur more frequently, particularly in SFG rickettsiae, and that both split genes and gene duplications are nearly nonexistent in TG rickettsiae. Interestingly, the genomes with plasmids and elevated levels of transposases and related elements, namely R. felis and R. bellii, also have elevated levels of gene duplications.
Figure 6. Manual curation of 259 non-representative OGs predicted by OrthoMCL.
Schema depicts 179 OGs repaired to representative after stitching together split ORFs (larger pie chart) and remaining true non-representative OGs defined by in-paralogs.
doi:10.1371/journal.pone.0002018.g006Table 4. Manual evaluation of 259 non-representative OGs across ten rickettsial genomes.
doi:10.1371/journal.pone.0002018.t004Table 5. Characterization of 259 non-representative OGs per ten rickettsial genomes1.
doi:10.1371/journal.pone.0002018.t005Core and Group-Specific C1OGs
The distribution of representative (1300) and non-representative (79) C1OGs and singletons are shown over our estimated phylogeny (Figure 7). Singletons (1467) are also shown but discussed in a separate section below. Of the 1379 C1OGs, 31% are annotated as hypothetical proteins (HPs), suggesting that a significant amount of even the conserved genes within these rickettsial genomes remain to be characterized. Not considering the bellii C1OG, which contains genes unique to the R. bellii genomes, the amount of HPs within the C1OGs decreases to 18%. The core and lineage specific C1OGs are discussed below.
Figure 7. Distribution of representative and non-representative class 1 OGs (C1OGs) and singleton ORFs over estimated rickettsial phylogeny.
Boxes depict the distribution of phylogenetic groups, singletons, plasmid associated groups, and host-related groups: Red = AG rickettsiae, aquamarine = TG rickettsiae, blue = TRG rickettsiae, brown = SFG rickettsiae, gray = higher-level groupings, light green = R. bellii strains only. Orange boxes depict genes found on the pRF plasmid of R. felis str. URRWXCal2 and chromosomes R. felis and both R. bellii strains (as of this publication the R. bellii plasmids remain unavailable). Genes specific to single rickettsial genomes (singletons) are in yellow boxes, with taxon abbreviations explained in the Figure 1 legend. Host specific groups are defined by green (insect) and tan (tick) boxes. Genome statistics were compiled from the PATRIC and NCBI databases. Cladogram is based on trees shown in Figure 3. Inset in dashed box describes general schema for each box. *Total R. felis genome size: 1,485,148 bp = chromosome; 62,829 bp = pRF and 39,263 bp = pRFδ.
doi:10.1371/journal.pone.0002018.g007Core rickettsial genes.
OrthoMCL grouped 731 representative and 21 non-representative protein families that are present in all ten analyzed rickettsial genomes (Table S2). Thus, the genes encoding these proteins define the foundation of rickettsial biology, such as “house-keeping” functions, as well as rudimentary processes in host cell recognition, invasion and survival (but not necessarily virulence as not all Rickettsia spp. are known pathogens). The distribution of the assigned cellular functions of each of these core proteins provides insight on the conservation of cellular activities relative to other bacteria (Figure 8A). Not surprising, OGs involved in translation represent the largest functional category (16.14%), as other cellular functions such as amino acid (2.6%), carbohydrate (2.1%), nucleotide (2.3%), and lipid (2.2%) synthesis are less necessary when many of these resources can be obtained from host cells [61], [62]. Analyzing a crude depiction of the R. felis proteome, Ogawa et al. [40] reached a similar observation as their 172 identified proteins sorted into cellular function categories similar to those assigned for our core proteins, although with far fewer members per category (Figure 8B). The core rickettsial protein distribution across cellular function categories is also similar to another obligate intracellular pathogen, Chlamydia trachomatis, suggesting that this lifecycle is defined by reduction of many genes with conserved cellular functions (save translation) in facultative intracellular (Yersinia pestis) and extracellular (Escherichia coli) pathogenic bacteria. The percentage of ORFs coding for metabolic genes is lower in the obligate intracellular bacteria, with exception of the coenzyme transport/metabolism and lipid transport/metabolism genes of Chlamydia, which equal and exceed that of the two larger genomes, respectively.
Figure 8. Bioinformatic analysis of core representative OGs.
(A) Assignment of 731 core representative RiOGs to predicted cellular function categories. Format follows that established at the COG database (NCBI) except for cf = combined function and rpe = rickettsial palindromic element. (B) Comparison of the distribution of cellular function categories across 731 core rickettsial OGs (Ri), a recent protein expression profile for R. felis [40] (Rf), and COGs for three other bacteria: Escherichia coli (Ec), Yersinia pestis (Yp) and Chlamydia trachomatis (Ct). Inset at left shows the number of genes per genome for cellular function categories involved in organic and inorganic transport and metabolism (E, F, G, H, I, P, and Q) followed by the percentage these genes comprise of total protein-encoding genes. Results from a six-way regression analysis are shown in the right inset.
doi:10.1371/journal.pone.0002018.g008AG rickettsiae.
Based on phylogeny estimation of over 30 proteins that placed R. canadensis basal to the TG, TRG and SFG rickettsiae, we categorized it with both R. bellii strains in the AG rickettsiae [28], a result recovered here and consistent with several previous studies [3; consensus tree of Vitorino et al. [63]]. Conversely, our analysis of OG distribution recovered only two proteins that are unique to AG rickettsiae: RiOG_1416 (Type I restriction-modification system, M subunit) and RiOG_1429 (F pilus assembly protein TraB). RiOG_1416 is truncated in R. bellii str. OSU 85-389 and extremely truncated in R. canadensis. Similarly, RiOG_1429 is truncated in R. canadensis; thus it is unlikely that either ORF is an important signature for AG rickettsiae. Furthermore, while both strains of R. bellii share 321 unique representative protein families (Figure 7, Table S3), R. canadensis only shares two unique proteins with the remaining derived rickettsiae: RiOG_925 (COG0419: ATPase involved in DNA repair) and RiOG_927 (methyltransferase family protein), with the latter likely part of a multigene family with other R. bellii homologs. Thus, OG distribution provides little evidence for placing R. canadensis either within AG rickettsiae or as derived. For instance, of the three derived rickettsial groups, R. canadensis shares more OGs with SFG (13; Figure S2-C8) than with either TG (3; Figure S2-B16) or TRG (5, Figure S2-B15) rickettsiae. However, the three OGs shared between R. canadensis and TG rickettsiae are all unique sugar transferases, and all three genomes share an unprecedented 52 lost OGs relative to the remaining seven rickettsial genomes (Table 6; Figure S2-F3). Interestingly, R. canadensis shares zero lost genes with either TRG or SFG rickettsiae. It also shares with R. prowazekii a unique split gene, scaI, that is the most conserved member of the scas and is present in all analyzed Rickettsia spp. [57]. Thus, while phylogeny estimation places R. canadensis basal to the TG, TRG and SFG rickettsiae, and common OGs suggest an affinity to SFG and TRG rickettsiae over TG rickettsiae, the mode of gene loss across the lineages branching off after R. bellii suggests the position of R. canadensis within our generated phylogeny is well supported, but with possible affinities with TG rickettsiae, which were originally suggested based on serological cross reactivity studies [64]. Accordingly, phylogenetic analysis and signature proteins alone should not be solely used to characterize rickettsial groups, as shared absence of genes may reflect relatedness that is difficult to detect otherwise in these highly reductive genomes.
Table 6. OGs missing in the lineage spanning R. canadensis and TG rickettsiae.
doi:10.1371/journal.pone.0002018.t006Interestingly, Vitorino et al. [63] recently demonstrated an affinity between R. canadensis and R. helvetica based on phylogeny estimation from eight genes, although they concluded that the phylogenetic position of R. canadensis was unstable, which is consistent with previous studies. For instance, like SFG rickettsiae, R. canadensis was isolated from ixodid ticks and is maintained transstadially and transovarially [65], [66], grows within the nuclei of its host [65], and contains both rOmpA and rOmpB genes [67], [68]. However, like TG rickettsiae, R. canadensis grows abundantly in yolk sac, lyses red blood cells, is susceptible to erythromycin, and forms smaller plaques as compared to SFG rickettsiae [69]. Genomic characteristics are just as anomalous, as despite sharing the same G+C% [26], [69] and only a slightly larger genome size than TG rickettsiae (Figure 7), R. canadensis shares more common repetitive elements with SFG rickettsiae genomes than with any other group [26] and has many similar genes found within the tra cluster of R. massiliae [70]. Switching the position of R. canadensis in our genome alignment to reflect a derived relationship relative to TG rickettsiae did not improve synteny with the other rickettsial genomes, and despite a large central inversion, R. canadensis gene order is highly conserved with most of the derived taxa (Figure S1-D). In an effort to test a putative affinity between R. canadensis and R. helvetica (genome sequence unavailable), we selected 16 existing full or partial gene sequences for R. helvetica and estimated a phylogeny (Figure 9). R. helvetica is supported as basal to the remaining SFG rickettsiae in an otherwise identical phylogeny estimated from the 731 core rickettsial genes (Figure 3), thus refuting an affinity between R. canadensis and R. helvetica. The recent phylogenies estimated from 16S rDNA and groEL nucleotide sequences, the VirB4 protein and 14 concatenated proteins of the T4SS complex, and entire genome sequences placed R. canadensis between TG and TRG rickettsiae [26]; however, R. bellii was not sampled, likely affecting character polarity with the absence of an ancestral taxon. Thus, given our estimation of phylogeny from all available annotated rickettsial genomes, we are confident in the placement of R. canadensis as basal to the TG, TRG and SFG rickettsiae, although limited similarity is apparent to both R. bellii genomes as revealed by OG distribution and synteny. It is not unreasonable to predict that R. canadensis will ultimately group within a fifth distinct rickettsial group once more genomes are sequenced from lesser known rickettsiae, particularly species non-pathogenic to humans.
Figure 9. Phylogeny estimation of the ten analyzed rickettsial taxa plus R. helvetica and R. australis based on 16 proteins.
See Table S13 for gene names and sequence accession numbers. Tree estimated under parsimony (see text).
doi:10.1371/journal.pone.0002018.g009TG rickettsiae.
Despite being distinct from the other rickettsial groups with its highly reductive genomes and strictly insect-specific lifestyles, TG rickettsiae were predicted to contain only three unique representative OGs: a putative GTP pyrophosphokinase (RiOG_2080) and two HPs (RiOG_2081 and RiOG_2082). RiOG_2080 is part of a probable multigene family that is duplicated in most rickettsial genomes. These enzymes catalyze the synthesis of guanosine 5′-triphosphate 3′-diphosphate (pppGpp) as well as guanosine 3′,5′-bispyrophosphate (ppGpp) by transferring pyrophosphoryl groups from ATP to GTP or GDP respectively [71], functioning as mediators of the stringent response that coordinate a wide range of cellular activities in reaction to changes in nutritional abundance [72]. While common in multiple variable copies across the sampled genomes, the role lineage specific GTP pyrophosphokinases play in accommodating the different modes of intracellular replication and intercellular spreading by different rickettsial groups is worth exploring. RiOG_2081 is an uncharacterized protein conserved in a limited number of other bacteria (COG3274) and unknown from non-TG rickettsiae. The distribution of this protein, a putative membrane associated acyltransferase, in many pathogenic bacterial species and one bacteriophage, PhiV10, is interesting (Table 7). Finally, RiOG_2082 is a small putative ORF that BLASTs to no other organisms, with the start codon missing in R. typhi.
Table 7. Results of a BLASTP search for RiOG_2081 using RP338 (R. prowazekii) as a query1.
doi:10.1371/journal.pone.0002018.t007While a wealth of unique genes seemingly does not define TG rickettsiae, 53 unique gene loss events may offer insight into the streamlined manner of their evolution (Table 6). The loss of the Arp2/3 complex activating protein, rickA, from TG rickettsiae has been well-documented, and distinguishes this group in its mode of host cell spreading [73], [74]. Interestingly, our comparative analysis has revealed two other curious proteins that are present and conserved in all other non-TG rickettsiae genomes. The first is RiOG_897, a putative trichohyalin, which are intermediate filament-associated proteins found predominantly in the hair follicle cells of mammals [75], [76] but also expressed in the hard palate, tongue, nail bed, and a suite of pathological epidermal tissues [77], [78]. We discuss more about trichohyalins below in regards to insect-associated rickettsiae containing a unique trichohyalin-like homolog that is different from the gene found in all other non-TG rickettsiae. The second interesting OG (RiOG_901) found exclusively in non-TG rickettsiae is an ecotin-like protein. Ecotin is a dimeric periplasmic protein described in Escherichia coli that belongs to the protease inhibitor I11 (ecotin) family (PF03974). Ecotin inhibits several pancreatic serine proteases, including chymotrypsin, trypsin, elastases, factor X, kallikrein, as well as a variety of other proteases [79]–[81]. Eggers et al. [82] have shown that ecotin protects E. coli from neutrophil elastase (NE), a mammalian serine protease demonstrated to be important for neutrophil killing of several gram-negative bacteria. Specifically, NE cleaves ompA causing increased permeability to the bacterial outer membrane [83]. Once NE translocates across the vulnerable outer membrane, it functions in inhibiting bacterial cell growth and repair, causing cell death. The presence of ecotin in the periplasm inhibits NE function, thus fo
Start a discussion on this article