Conceived and designed the experiments: QC LNK NVG. Performed the experiments: QC LNK. Analyzed the data: QC LNK. Contributed reagents/materials/analysis tools: BHK. Wrote the paper: QC.
The authors have declared that no competing interests exist.
Ever since HLB was described, efforts have been devoted to understanding the plant response to the infection
In 2009, the complete genome sequence of
The
Here we report computational analysis followed by partial manual curation of the
All the sequences of
First, we predicted the local sequence features (listed in
Feature | Programs Used For The Prediction | Implication |
|
PSIPRED (v2.0) |
assist 3D structure and domain boundary prediction |
|
DISEMBL (v1.5) |
assist 3D structure modeling and indicate the domain boundaries |
|
TMHMM (v2.0), TOPPRED (v2.0), HMMTOP (v2.0), MEMSAT (v3.0), MEMSATSVM and Phobius | predict subcellular localization; provide hints to the protein function. predict the topology of membrane proteins |
|
SignalP (v3.0), Phobius and MEMSATSVM | predict secreted proteins that could potentially be virulence factors |
|
SEG |
Reveal false positive hits of homology search caused by matching of low-complexity region |
|
COILS |
reveal false positive hits of homology search caused by matching of non-homologous coiled coils |
|
PSI-BLAST, AL2CO | reveal essential residues for the folding and function of a protein |
Based on the information on our website, we manually assigned function to each protein and selected templates to build a structural model by homology modeling using MODELLER
Homologous proteins within the
The results of computational analysis of all 1,233
This section provides relevant information from and links to other databases. Several existing annotations were listed, including: gene description from NCBI (definition line in NCBI Protein Database), COG prediction (from NCBI, based on homologous relationship to protein families in the Cluster of Orthologous Groups (COG) database), KEGG prediction (annotation in the KEGG database) and the SEED prediction (annotation in the SEED database).
(A) Section I: basic information with function predictions from different resources and links to other databases. (B) Section II: local sequence feature prediction. It contains the following information: (1) sequence (highlighted according to the property of amino acid) from NCBI database; (2),(3) secondary structure prediction by PSIPRED and SSPRO (H: α helix, E: β strand, C: coils); (4) Coil and loop (highlighted in pink) prediction by DISEMBL; (5) Flexible loop (highlighted in pink) prediction by DISEMBL; (6) Low complexity region (highlighted in light red) prediction by SEG; (7)-(9): Disordered region (highlighted in red) prediction by DISPRED, DISEMBL and DISPRO; (10)-(15) Transmembrane helix (highlighted in blue) prediction by TMHMM, TOPPRED2, HMMTOP, MEMSAT, MEMSATSVM, Phobius; (14)-(17) Signal Peptide (highlighted in green) prediction by MEMSATSVM, Phobius, SignalP Hidden Markov Model mode and SignalP Neural Network mode; (18) Coiled coils (highlighted in yellow) prediction by COILS; (19),(20) Sequence colored by conservation (highlighted from white, through yellow to dark red as the level of conservation increases) computed on the Multiple Sequence Alignment of homologous proteins filtered by 70% or 90% sequence identity. (C) Section III: top 10 homologs detected by BLAST or 2 iterations of PSI-BLAST are listed. For each hit, the alignment and the species associated with the hit are provided. (D) Section IV: homologous protein families and conserved domains detected by RPS-BLAST. The confident hits detected by certain method are listed and the relative information of each protein family and its alignment to the
Local sequence properties, such as predicted secondary structures and disordered regions, are helpful for predicting 3D structures, whereas, SP and TMH predictions are suggestive of protein localization and function. This section summarizes prediction of local sequence features (listed in
Close homologs usually share similar functions inherited from a common ancestor, which is the basis for function prediction. In addition, the phylogenetic distribution of closely related proteins provides hints about the evolutionary history and reveals HGT events. HGT has a profound impact on the evolution of bacterial pathogens and it is a common mechanism to gain virulence-associated genes
Protein classification and the extensive information gathered for protein families in databases are valuable resource for functional annotation. In this section, we listed related protein families and conserved domains identified by RPS-BLAST (e-value cutoff 0.005) and HHsearch (probability cutoff 90%) in ranked order. Information is presented in similar format to that described in section III, with a summary of hits at the top and detailed alignments and descriptions of protein families listed at the bottom.
Homology modeling remains the most reliable and effective way to predict protein 3D structure
With the information from our website, we performed manual analysis to predict the spatial structure and function of each protein, and the results are available at
We combined the results of computer programs and manual curation to identify potential transmembrane and extracytoplasmic proteins. We applied 6 TMH predictors (TMHMM
Periplasmic and extracellular proteins are generally targeted to their specific subcellular compartments via protein secretion systems. Gram-negative bacteria possess 6 classic protein secretion systems. Type II and Type V Secretion Systems transport proteins from periplasm to extracellular space. Their function requires Sec or Tat machinery to translocate proteins from cytoplasm to periplasm. In contrast, Type I, Type III, Type IV and Type VI Secretion Systems can directly export proteins from cytoplasm to extracellular space and thus do not depend on Sec or Tat
A substrate of the Sec complex can be recognized by an N-terminal SP, which is a hydrophobic α-helical segment flanked by a positively charged short region at its N-terminus and several polar residues at its C-terminus that could be cleaved by the Sec machinery. We manually examined all 218 proteins that were predicted to have SPs by any automatic method to identify extracytoplasmic proteins. After integrating additional evidence, we hypothesize that 86 proteins (marked in Supplementary
Many proteins from the initial list of 218 candidates were excluded due to the following reasons: (1) the SP cannot be consistently predicted (predicted by only 1 out of 4 methods); (2) the protein is predicted to have multiple TMHs, such as the sensory box/GGDEF family protein (locus: CLIBASIA_01765; gi: 254780468); (3) the confidently predicted function of the protein suggests its localization in the inner membrane or cytoplasm, for example, the ribosomal protein L35, which is predicted to have a SP by 3 out of 4 predictors applied; (4) close homologs likely lack SPs. It is important to note that transmembrane proteins might have SPs at their N-termini, although such cases are not common in bacteria
However, this bacterium and the other congener (
Proteins without SPs can be secreted in Sec-independent manners. We detected these proteins by their homology to known substrates of Sec-independent secretion systems and their genomic loci.
In addition, the flagellar assembly and
576
In summary, we hypothesize that 86 proteins are secreted via the Sec machinery and 21 without SPs are likely targeted to the extracytoplasmic space through Sec-independent mechanisms. In addition, 184 proteins likely locate in the inner membrane of this Gram-negative bacterium (shown in
The yellow disk represents the set of protein coding genes identified by NCBI and the pink disk stands for the set of protein coding genes predicted by the SEED. The red, blue and green circle includes all confidently predicted protein coding genes, transmembrane proteins and secreted proteins via Sec in the proteome after manual inspection.
Confidently identified homology to known proteins or protein families allows us to predict the functions of 80.4% of all 1,105 proteins, while NCBI and SEED annotated 67.6% and 71.0% of them, respectively, or 74.1% combined. Moreover, out of the 217 proteins lacking explicit function predictions, based on our manual curation discussed above, 40 are predicted to be secreted through Sec machinery and thus function in extra cytoplasm. 49 unannotated proteins are likely to be transmembrane proteins in the inner membrane. These proteins comprise 41.0% of the unannotated proteins. Their predicted subcellular localization suggests their general function in communicating with the environment. (All function and localization predictions are listed in Supplementary
Another application of our website is to present putative homologous structures for template-based structure modeling. Confident templates identified by programs (HHsearch probability cutoff 90%, PSI-BLAST or RPS-BLAST e-value cutoff 0.005) and confirmed by manual curation cover 74.3% of all residues in the
More specific analysis of the
22% of
The first category is
Group | Locus | gi | Comments |
I | CLIBASIA_02215 | 254780556 | with SP, potential virulence factor |
CLIBASIA_04405 | 254780980 | with SP, potential virulence factor | |
II | CLIBASIA_03915 | 254780886 | with SP, potential virulence factor |
CLIBASIA_04530 | 254781005 | with SP, potential virulence factor | |
III | CLIBASIA_04425 | 254780984 | with SP, potential virulence factor |
CLIBASIA_05140 | 254781126 | do not have the SP part | |
CLIBASIA_04410 | 254780981 | with SP, potential virulence factor | |
IV | CLIBASIA_00440+ CLIBASIA_00445 | 254780203+254780204 | Two neighboring proteins both aligned to part of CLIBASIA_05480. It is possible they are psuedogenes |
CLIBASIA_05130+ CLIBASIA_05135 | 254781124+254781125 | Two neighboring proteins both aligned to part of CLIBASIA_05480. It is possible they are psuedogenes | |
CLIBASIA_05480 | 254781189 | Transmembrane protein |
The second category of duplicated genes is
The third category contains
One of the most unusual homologous groups consist of von Willebrand factor type A (shown in
(A) Domain diagram of the protein (B) Predicted structure of the protein. The side-chains of the active site residues are shown.
Another homologous group that is potentially harmful to the host consists of hypothetical proteins CLIBASIA_00070 (gi: 254780135), CLIBASIA_04065 (gi: 254780914), CLIBASIA_04140 (gi: 254780929) and CLIBASIA_04540 (gi: 254781007). They are all predicted to harbor SPs. These four proteins share above 90% sequence identity with each other, so they likely preserve the same function. No confident homologs can be detected for them from organisms outside the
Interestingly, 1% of the
We inspected the species associated with each protein’s closest homolog in the NR database detected by BLAST, since this information is indicative to the evolutionary history of a protein. We excluded the closest relative,
Proteins whose closest homologs are from viruses are most likely related to bacteriophage integration. Most of them are from the recently integrated SC1 Liberibacter phage (colored green in Supplementary
Out of these proteins with abnormal evolutionary history, we identified several potential virulence factors (colored red in Supplementary
We carried out computational analysis of all
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
(PDF)
Qian Cong is a Howard Hughes Medical Institute International Student Research fellow. The authors are grateful to Dr. Jimin Pei for helpful discussion and Jeremy Semeiks for proofreading the manuscript.