The authors have the following interests: Clotilde Teiling is a current and Timothy T. Harkins a former employee of 454 Life Sciences, one of the funding sources for the study. Sequencing and genome assembly for this project was provided by 454 Life Sciences. Clotilde Teiling currently serves as Marketing Manager Technology - Sequencing and Arrays at 454 Life Sciences. Timothy T. Harkins previously served as Director of Marketing at 454 Life Sciences and is currently Director of Research and Development at Life Technologies. There are no further patents, products in development or marketed products to declare. This does not alter the authors’ adherence to all the PLoS ONE policies on sharing data and materials, as detailed online in the guide for authors.
Conceived and designed the experiments: RAE JAE MTF. Performed the experiments: EGW CH CT TTH MOS. Analyzed the data: EAL MGIL AD KSYS MOS. Contributed reagents/materials/analysis tools: CT TTH JAE MTF. Wrote the paper: EAL.
We report the sequencing of seven genomes from two haloarchaeal genera,
In recent years, the Archaea have been shown to play major roles in global element cycling
The family
In addition to serving as a model group for study of the Archaea in general, the Haloarchaea possess unique properties making them important objects of study in their own right. For example, understanding the genetic basis for these organisms’ ability to thrive in hypersaline environments (∼3–5 M salts) will inform efforts to develop salt-tolerant crop plants for growth in currently non-arable land. The Haloarchaea are also promising sources of salt and ionic liquid tolerant enzymes for various industrial processes, including biofuels manufacturing
As of September 2011, the National Center for Biotechnology Information (NCBI) lists 1,628 completed bacterial, 37 eukaryotic and 116 archaeal genomes [
While published and in-progress haloarchaeal genomes have yielded insights into the biology and evolution of these organisms
Organism | Ascension #s | Reference |
PubSEED: 662479.5 | This study | |
PubSEED: 662478.4 | This study | |
PubSEED: 662480.4 | This study | |
PubSEED: 523841.6 | This study | |
GenBank: CP001953.1–CP001957.1 |
|
|
PubSEED: 662475.4 | This study | |
GenBank: AY596290.1–AY596298.1 |
|
|
PubSEED: 662476.5 | This study | |
PubSEED: 662477.4 | This study | |
GenBank: CP001365.1–CP001367.1 | None | |
GenBank: CP001688.1, CP001689.1 |
|
|
GenBank: CP001687.1 |
|
|
GenBank: CR936257.1–CR936259.1 |
|
|
GenBank: AM774415.1-AM774419.1 |
|
|
GenBank: AE004437.1, AF016485.1, AE004438.1 |
|
|
GenBank: AM180088.1, AM180089.1 |
|
|
GenBank: CP01690.1–CP01695.1 |
|
|
GenBank: CP002062.1–CP002068.1 |
|
|
GenBank: CP001932.1–CP001935.1 |
|
|
GenBank: CP001860.1–CP001866.1 |
|
|
GenBank: CP002839.1–CP002842.1 |
|
Eight species of the family Halobacteriacea - three Haloarcula (Har. californiae, Har. sinaiiensis, Har. vallismortis) and five Haloferax (Hfx. denitrificans, Hfx. mediterranei, Hfx. mucosum, Hfx. sulfurifontis, and Hfx. volcanii) - were sequenced on a single GS FLX Titanium run, with the previously sequenced Haloferax volcanii included as a control. After removal of low-quality nucleotides, mean read-length for each genome was between 410 and 439 base pairs (
Distribution of read-length in seven newlysequenced haloarchaeal genomes and one sequencing control (
Organism | #Contigs >200 bp | Assembled bp | Coverage | CDSs | % w/Function | RNAs | % Coding | %GC | Isolated From |
|
168 | 4,420,514 | 21 | 4627 | 62.5 | 69 | 87.00 | 60.82 | Baja, Mexico |
|
140/10* | 4,524,388 | 19/45* | 4538 | 63.1 | 55 | 84.62 | 60.77 | Red Sea, Israel |
|
88 | 3,930,055 | 24 | 4084 | 65.2 | 84 | 88.22 | 61.79 | Death Valley, California, USA |
9 | 4,274,642 | N/A | 4325 | 51.8 | 62 | 85.83 | 61.12 | Dead Sea | |
|
21 | 3,848,468 | 25 | 3809 | 70.5 | 58 | 85.80 | 66.27 | San Francisco Bay, California, USA |
|
141/5* | 3,905,749 | 26/53* | 3942 | 65.7 | 62 | 85.83 | 60.27 | Alicante, Spain |
|
26 | 3,371,699 | 29 | 3455 | 66.0 | 61 | 86.38 | 61.84 | Shark Bay, Australia |
|
29 | 3,816,558 | 27 | 3856 | 67.6 | 59 | 86.56 | 66.30 | Zodletone spring, SW Oklahoma, USA |
145 | 3,920,004 | 25 | N/A | N/A | N/A | N/A | N/A | Shore mud, Dead Sea | |
5 | 4,012,900 | N/A | 4015 | 63.3 | 49 | 85.56 | 65.48 | Shore mud, Dead Sea |
Note: *Indicates results of mate-pair sequencing.
Note: The
There were thirteen single base call discrepancies between the
To determine the benefit of paired-end sequencing to improving assembly, additional 8 Kb span paired-end libraries were sequenced for one species of each genus,
The assembly of
The newly-sequenced genomes were similar in size to previously sequenced haloarchaea, ranging from 3.37 to 4.45 Mbp with GC content between 60.3% and 66.3% (
Further investigation of the protein coding complements of
COGs comprising a significantly different fraction of protein coding genes between
The low number of scaffolds in the final
GenoPlast was used to infer rates of genomic gain (red) and loss (blue) in the Haloferax lineage, depicted by width of line along phylogeny branch. 95% confidence intervals are represented by bordering thin blue and red lines.
As an independent investigation of gene flux within the
Number of genes in each category differentially present between the
As has been discussed previously
A semi-phosphorylated alternative to the Entner-Doudoroff pathway, which bypasses the first six steps in glycolysis, has been previously described for the haloarchaea
To investigate whether these results could stem from a high level of sequence divergence of this enzyme, a BLASTp search against NCBI’s non-redundant protein database was conducted with the same query. The only archaeal match was to
Several recent studies have highlighted the unique metabolic capabilities of the haloarchaea and have uncovered enzymes with potential utility in several industrial processes, including biofuels manufacturing. Here we expand upon previous studies by surveying twenty-one haloarchaea for genes of biotechnological importance as well as genes involved in several metabolic processes unique to the haloarchaea.
Current biofuel production processes depend upon the use of cellulases, which are abundantly distributed in nature. However, the strong ionic liquids increasingly used in biomass pretreatment are inhibitory to many cellulases
Gene counts of enzymes discussed in the text superimposed on
Organism | CRISPRs | CAS genes |
16 | 7 | |
12 | 7 | |
10 | 6 | |
5 | 7 | |
4 | 6 | |
6 | 4 | |
5 | 7 | |
3 | 6 | |
1 | 0 | |
3 | 11 | |
2 | 6 | |
2 | 6 | |
4 | 6 | |
0 | 0 | |
0 | 0 | |
0 | 0 | |
1 | 0 | |
0 | 0 | |
2 | 0 | |
1 | 0 | |
0 | 0 |
Note: Number of CRISPRs is the number of distinct CRISPR clusters.
Haloarchaea have also been explored for their ability to produce polyhydroxyalkanoates (PHA), a potential renewable and biodegradable substitute for petroleum-derived plastics
The novel metabolic capabilities of haloarchaea include the recently discovered methylaspartate cycle, a unique pathway for assimilation of acetyl-coA derived from metabolism of organic substrates identified in
The results of our genomic screens illustrate the power of comparative genomics for discovering patterns in gene distribution and for selecting target organisms or clades in which to conduct searches for genes of functional interest. For instance, we show that the search for polyhydroxyalkanoate synthases would benefit from including the
The opsin family proteins are widespread in the Haloarchaea and serve a number of important roles in the light-dependent physiology of these organisms. Four main classes of haloarchaeal opsins have been previously described: the bacteriorhodopsin H+ pump utilizes light energy to establish a proton gradient for ATP production, halorhodopsin serves as a Cl- pump to regulate cytoplasmic osmolarity, and the class 1 and class 2 sensory rhodopsins enable phototactic and photophobic responses to different wavelengths of light. In addition, a third class of sensory opsins with unknown function has been reported for
We performed a genomic screen to identify opsin homologs in the twenty-one currently sequenced haloarchaeal genomes. One or more opsins were found in fourteen of the twenty-one genomes, with most species lacking at least one of the four canonical opsin classes. In fact, only
A maximum likelihood tree of the four previously described haloarchaeal opsin families along with the newly described sensory rhodopsin 3, with bootstrap support values above 0.50 shown for 500 bootstrap iterations. Sensory rhodopsin - SR, halorhodopsin - HR, bacteriorhodopsin - BR.
Proliferating cell nuclear antigen (PCNA), also referred to as DNA sliding clamp in archaea, plays an essential role in many aspects of DNA metabolism, serving as a processivity clamp for the replicative DNA polymerase and also acting as a scaffold for recruitment of proteins with diverse roles in DNA metabolism
We have discovered four instances of duplicate PCNAs in the haloarchaea, including the newly sequenced
(A) Crystal structure of
Previous work has shown
The Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) system is a recently discovered method of viral immunity present in a variety of bacterial and archaeal genomes. Although previous studies have reported CRISPR arrays in approximately 40% of tested bacterial and 90% of archaeal genomes
The twenty-one sequenced haloarchaeal genomes available at the time of this study were searched for CRISPRs and CRISPR-associated (Cas) genes. Twelve of the twenty-one species were found to possess both CRISPRs and one or more Cas genes, including all
The direct repeats (DRs) of the CRISPR array have been previously shown to be highly conserved across CRISPR loci both within and between closely related species of haloarchaea
Alignment of the conserved haloarchaeal DRs shows many highly conserved positions and a few positions (mainly in the center of the repeat) with no apparent conservation. This conservation pattern suggests a conserved stem-loop in the secondary structure, which was confirmed with the program RNAFold
(A) Secondary structure of a sub-group of highlyconserved CRISPR direct repeats (DRs) predicted with RNAfold with percent conservation shown as heatmap. (B) Maximum likelihood tree of DRs with bootstrap support values above 0.50 shown for 500 bootstrap iterations.
Archaeal transcription initiation is dependent upon two general transcription factors (GTFs) orthologous to eukaryotic TATA-binding protein (TBP) and transcription factor II B (TFIIB, known as TFB in the Archaea)
Our analysis reveals that, although TBP and TFB expansions are widespread across the haloarchaea (
A maximum likelihood tree of TATA-binding protein (TBP) homologs identified by RAST with bootstrap support values above 0.50 shown for 500 bootstrap iterations. Successive duplications are shown in darkening shades of green (
Organism | TBPs | TFBs |
|
4 | 7 |
|
4 | 8 |
|
3 | 9 |
|
4 | 8 |
|
4 | 9 |
|
2 | 8 |
|
1 | 8 |
|
1 | 8 |
|
1 | 7 |
|
4 | 9 |
|
1 | 4 |
|
1 | 7 |
|
1 | 7 |
|
8 | 9 |
|
6 | 7 |
|
2 | 8 |
|
2 | 8 |
|
1 | 7 |
|
1 | 7 |
|
1 | 6 |
|
1 | 9 |
Comparisons of the twenty-one currently available haloarchaeal genomes have revealed dynamics of genome evolution at scales ranging from horizontal transfer and duplication of individual genes to major gene loss events and large-scale expansion of functional groups. Deep sequencing of the
Strains were acquired as desiccated cells from the American Type Culture Collection (ATCC) in Manassas, Virginia and were rehydrated in recommended media according to ATCC protocols. Strains were grown to stationary phase at 37°C in liquid culture and genomic DNA harvested with the Wizard Genomic DNA purification (Promega).
Fragment libraries were constructed for eight species of the family Halobacteriacea, three from the genus
Genomes were annotated with Rapid Annotation Using Subsystems Technology (RAST)
In order to determine phylogenetic distribution of haloarchaeal genes, a gene presence/absence matrix was constructed by the following process. Independent multi-genome alignments were made for the
Hidden Markov Models (HMMs) were generated for each SHT using HMMER 3, resulting in 13,276 HMMs. The 1,303 completed archaeal and bacterial genomes available from NCBI as of March 15, 2011 were downloaded and a single genome from each genus selected at random, resulting in 396 genomes. Each SHT HMM was searched against these 396 genomes and the eight halophile genomes generated for this study using HMMER 3. Each gene was counted as belonging to the HMM if it had an E-value below 0.0001 and the hit covered greater than 80% of the length of both the gene and the HMM. If a gene hit more than one HMM it was counted only for the HMM with the best E-value. These hits were then used to generate a 13,276 x 405 presence/absence matrix. The genomes and HMMs were clustered using the ‘ctc’ library in R
Three methods were used to determine the phylogenetic relationships among the newly sequenced haloarchaea. First, a tree was constructed using a set of twenty-eight highly conserved marker genes identified with Amphora
A second method was also used wherein the proteins within each of the 398 SHTs conserved across the seven newly-sequenced haloarchaeal genomes and the two previously sequenced representatives of the
Thirdly, a tree was constructed based on the commonly used molecular marker
The multiple genome alignment of six
Opsin homologs were gathered from the RAST-annotated GenBank files using in-house scripts available at our website
PCNA homologs were gathered from the RAST-annotated GenBank files using in-house scripts, available at our website
CRISPRs were identified for the twenty-one haloarchaeal genomes included in this study using the PILER-CR CRISPR prediction program
Protein sequences of the predicted general transcription factors (GTFs) TATA-binding protein (TBP) and transcription factor B (TFB) were extracted from the RAST-annotated GenBank files using in-house scripts, available at our website
Acid-hydrolyzed rice hull obtained from MicroMidas (Sacramento) was neutralized to pH ∼7 and inoculated with
(EPS)
(EPS)
(PDF)
(PDF)
(PNG)
(EPS)
(PDF)
(XLSX)
(XLSX)
(XLSX)
(XLSX)
(CDT)
(TXT)
The authors would like to thank members of the Facciotti and Eisen labs for helpful discussions and for critically reviewing the manuscript.