Advertisement
Research Article

dndDB: A Database Focused on Phosphorothioation of the DNA Backbone

  • Hong-Yu Ou,

    Affiliation: Laboratory of Microbial Metabolism and School of Life Sciences & Biotechnology, Shanghai Jiaotong University, Shanghai, People's Republic of China

    X
  • Xinyi He,

    Affiliation: Laboratory of Microbial Metabolism and School of Life Sciences & Biotechnology, Shanghai Jiaotong University, Shanghai, People's Republic of China

    X
  • Yucheng Shao,

    Affiliation: Laboratory of Microbial Metabolism and School of Life Sciences & Biotechnology, Shanghai Jiaotong University, Shanghai, People's Republic of China

    X
  • Cui Tai,

    Affiliation: Laboratory of Microbial Metabolism and School of Life Sciences & Biotechnology, Shanghai Jiaotong University, Shanghai, People's Republic of China

    X
  • Kumar Rajakumar,

    Affiliations: Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester, United Kingdom, Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester, United Kingdom

    X
  • Zixin Deng mail

    zxdeng@sjtu.edu.cn

    Affiliation: Laboratory of Microbial Metabolism and School of Life Sciences & Biotechnology, Shanghai Jiaotong University, Shanghai, People's Republic of China

    X
  • Published: April 09, 2009
  • DOI: 10.1371/journal.pone.0005132

Abstract

Background

The Dnd DNA degradation phenotype was first observed during electrophoresis of genomic DNA from Streptomyces lividans more than 20 years ago. It was subsequently shown to be governed by the five-gene dnd cluster. Similar gene clusters have now been found to be widespread among many other distantly related bacteria. Recently the dnd cluster was shown to mediate the incorporation of sulphur into the DNA backbone via a sequence-selective, stereo-specific phosphorothioate modification in Escherichia coli B7A. Intriguingly, to date all identified dnd clusters lie within mobile genetic elements, the vast majority in laterally transferred genomic islands.

Methodology

We organized available data from experimental and bioinformatics analyses about the DNA phosphorothioation phenomenon and associated documentation as a dndDB database. It contains the following detailed information: (i) Dnd phenotype; (ii) dnd gene clusters; (iii) genomic islands harbouring dnd genes; (iv) Dnd proteins and conserved domains. As of 25 December 2008, dndDB contained data corresponding to 24 bacterial species exhibiting the Dnd phenotype reported in the scientific literature. In addition, via in silico analysis, dndDB identified 26 syntenic dnd clusters from 25 species of Eubacteria and Archaea, 25 dnd-bearing genomic islands and one dnd plasmid containing 114 dnd genes. A further 397 other genes coding for proteins with varying levels of similarity to Dnd proteins were also included in dndDB. A broad range of similarity search, sequence alignment and phylogenetic tools are readily accessible to allow for to individualized directions of research focused on dnd genes.

Conclusion

dndDB can facilitate efficient investigation of a wide range of aspects relating to dnd DNA modification and other island-encoded functions in host organisms. dndDB version 1.0 is freely available at http://mml.sjtu.edu.cn/dndDB/.

Introduction

The Dnd DNA degradation phenotype was observed during normal and pulsed-field gel electrophoresis of genomic DNA from Streptomyces lividans strain 66 [1]. DNA degradation during electrophoresis in the presence of tris, a commonly used biological buffer, has also been reported in many other distantly related bacterial species, such as Escherichia coli, Salmonella enterica, Klebsiella pneumoniae, Vibrio parahaemolyticus, Pseudomonas aeruginosa, Pseudomonas fluorescens, Mycobacterium abscessus, Clostridium botulinum, and Clostridium difficile. The Dnd phenotye was thought to involve a post-replicative DNA modification that rendered DNA susceptible to degradation at the electrophoretic anode. In 2005, the five-gene dndABCDE cluster responsible for this phenotype was identified in S. lividans [2]. Zhou at al. [2] demonstrated that the affected DNA had been modified in vivo by the addition of a sulphur-containing molecule through a likely biochemical pathway mediated by enzymes encoded by the dnd locus.

More recently the dnd cluster was shown to mediate the incorporation of sulphur into the DNA backbone via a sequence-selective, stereo-specific phosphorothioate modification in E. coli B7A [3]. By using high-performance liquid chromatography and mass spectrometry, the chemical structure of phosphorothioated DNA was determined revealing a sulfur atom in place of one of the nonbridging oxygen atoms on a DNA backbone-borne phosphate group. To our knowledge, this was the first report of natural modification of the DNA backbone itself and sets it apart from well-documented DNA methylation and other changes to DNA bases.

Intriguingly, the S. lividans dnd cluster lay within a large, mosaic genomic island named SLG [4], [5]. To date all 26 identified dnd clusters are borne on likely mobile genetic elements, twenty-five of which are harboured on genomic islands, fragments of alien DNA that have been incorporated into chromosomes of new hosts via horizontal gene transfer events [6].

The observed Dnd phenotype and recent microbiological, genetic and biochemical advances in the field have been reported in the scientific literature. However, disparate PubMed references and individual genome annotation and protein data deposited in public databases do not provide a unified resource required to facilitate the advanced searches, analyses and data manipulation necessary to fully exploit the available and rapidly emerging new data in the Dnd field. Consequently, we have created a MySQL database, dndDB, to efficiently organize all available data from experimental and bioinformatics analyses about the phosphorothioation of DNA in Eubacteria and Archaea and provide a central repository of associated documentation. We propose that our evolving, web-based dndDB resource will stimulate and facilitate research into many key questions, including the mechanism of sulfur incorporation, the biological significance of this DNA modification, the role, source and mode of dissemination of dnd-bearing genomic islands, and the potential for exploitation of these systems for biotechnological applications.

Results and Discussion

The purpose of dndDB is to provide a user-friendly interactive platform not only to efficiently archive, analyse and manipulate increasing data about bacterial and archeal dnd genes, linked island-borne genes, matching sets of cognate proteins, and the DNA phosphorothioation process itself, but to also empower researchers from different backgrounds to explore novel angels potentially related to this, thus far, unique DNA backbone modification process. A broad range of similarity search, sequence alignment and phylogenetic tools are readily accessible to allow for user-directed interrogation of the database, examination of user-supplied sequences and other individualized directions of research.

User interface

The dndDB homepage contains the following interfaces: ‘Introduction’ (Dnd background and references), ‘Dnd phenotype’ (experimental protocol and archived literature), ‘Gene&Cluster’ (degenerate primers, gene homologues and clusters), ‘Genomic island’ (genomic context), ‘Protein&Domain’ (putative function, homologues, conserved domains and references), ‘Search’ (search Dnd phenotype, gene or protein homologues by organism name), ‘Blast vs dndDB’ (gene/protein sequence BLAST against dndDB), ‘tBlastn for Dnd’ (Dnd protein prediction in user-supplied nucleotide sequence), ‘Restrict_Modifica’ (Dnd-dependent restriction-modification system), ‘Chemistry’ (sequence- and stereo-specific nature of DNA phosphorothioation), ‘Potential Applications’, ‘Useful Links’ and ‘Contact Us’.

Organisms exhibiting the Dnd phenotype

Electrophoresis-associated DNA degradation, otherwise known as the Dnd phenotype, is a puzzling and long-standing phenomenon frequently observed during pulsed field gel electrophoresis (PFGE), when instead of discrete bands a smear pattern results. The current version of dndDB includes a description of the Dnd phenotype in 24 bacterial species based on information extracted from PubMed references. The phylogenetic diversity and wide prokaryotic representation of these Dnd phenotype-positive organisms and others that we have shown to harbour dnd gene clusters is shown in Figure 1. These data are tabulated and easily retrieved using the ‘Search’ tool in dndDB. In addition, users can download an optimized Dnd phenotype verification protocol which utilizes activated tris-acetate-EDTA (TAE) buffer during agarose gel electrophoresis to check the Dnd phenotype of bacterial strains of interest. A simple PCR-based protocol to identify potential dndC gene homologues in bacterial isolates developed using dndDB is also provided. This method is also intended to serve as a template for other dndDB-facilitated PCR-based screening assays.

thumbnail

Figure 1. Inferred phylogenetic relationship of the 31 bacterial and one archael organism carrying known dnd clusters (denoted by orange ‘G’ balls) and/or documented to exhibit the Dnd phenotype (denoted by purple ‘P’ balls).

The tree shown was constructed on the basis of NCBI taxonomy (http://www.ncbi.nlm.nih.gov/Taxonomy/) by using iTOL [11], which is now accessible via dndDB.

doi:10.1371/journal.pone.0005132.g001

Horizontal gene transfer

Comparative analysis of dnd genes at a variety of granularities, such as the single gene, gene cluster, genomic island or genome-scale level, will greatly aid investigations into the evolution of dnd gene clusters and the mechanisms that brought about their widespread dissemination across diverse and distant bacterial species. In dndDB, a powerful multiple sequence alignment algorithm, Muscle v3.7 [7], and a Java alignment editor, JalView 2.4 [8], were integrated to facilitate the comparison of the dnd gene clusters from 24 taxonomically distinct bacterial species and one archael member from various geographic niches. In addition, the popular GBrowse viewer [9] that combines a database and interactive web page was employed for manipulating and displaying annotations on dnd-bearing genomes. Remarkably, all identified dnd clusters lay within larger mobile genetic elements, 23 within chromosomal islands, 2 in the islands in the plasmid-derived chromosome II of Pseudoalteromonas haloplanktis TAC125 and Vibrio fischeri MJ11, and one on the large Plasmid 3 of Mesorhizobium sp. BNC1 (see Table 1 for details). Analysis of these putative dnd-encoding islands demonstrated common key features typical of GIs: organism-atypical G+C contents, integration into tRNA genes, and/or possession of terminal direct repeats, integrase- and/or transposase-encoding sequences. Phylogenetic analysis of the dnd genes in the 26 identified dnd clusters confirmed the diverse nature of these sequences. Furthermore, significant discordances between the 16S rDNA- and dnd-derived phylogenetic trees, marked differences in the gene content within the remainder of the dnd islands, and the frequent absence of dnd islands in members of the same species, strongly supported the notion that the diverse dnd clusters and their cognate islands had been acquired independently on many occasions, rather than arising from a single or limited number of vertical evolutionary events. However, to date none of the defined dnd islands have been shown to be functionally mobile, though at least one, the S. lividans SLG island, is known to function as a typical, self-circularizing, site-specific integrative element [5].

thumbnail

Table 1. dnd clusters present on mobile genetic elements comprising 25 genomic islands and one plasmid

doi:10.1371/journal.pone.0005132.t001

We have also incorporated the SynView tool [10] into dndDB to facilitate larger scale synteny mapping so as to permit ready recognition of dnd island-borne orthologous genes. Figure 2 illustrates an example based on comparison of dnd islands from Escherichia coli, Salmonella enterica and Enterobacter sp. Such analyses will aid the identification of evolutionary links between members of this growing family of islands.

thumbnail

Figure 2. The dnd island in the Salmonella enterica serovar Saintpaul SARA23 genome that is currently being sequenced.

(A) The top axis corresponds to the dnd island-bearing contig (NCBI Refseq accession no. NZ_ABAM01000005), while the lower axis represents a magnified view of the region shown in the red box. The symbol ‘k’ in the coordinates denotes kilobase pairs. (B) A schematic view of the lower axis (above) illustrating the location of the 19.7-kb dnd island (orange line), the leuX tRNA gene integration site (red arrow head), and the upstream/downstream flanking regions (black lines) that are conserved across 14 completely sequenced Salmonella enterica genomes. The ‘5end’ and ‘3end’ backbone labels refer to the 5′- and 3′-flanking backbone segments in relation to the orientation of the leuX tRNA gene, respectively. (C) SynView-facilated synteny mapping of the dnd islands and immediate flanking sequences from three species: Salmonella enterica serovar Saintpaul SARA23 (19.7-kb island) [topmost], Escherichia coli B7A (17.9-kb tRNA-proximal end of island) [middle] and Enterobacter sp. 638 (16.9-kb island) [lower most]. The dnd genes are highlighted in blue, while these and other island-harboured genes are marked by orange frames. Individual genes are hyperlinked to related information that can be accessed using GBrowse. Light-blue-shaded trapezoids link orthologous genes between the three species.

doi:10.1371/journal.pone.0005132.g002

Dnd proteins and conserved domains

Amino acid sequences of Dnd proteins from the diverse dnd-bearing hosts were multiply aligned with Muscle [7], visualised and edited with JalView [8]. The neighbor-joining phylogenetic tree of matching 16S rRNA sequences was constructed by using Muscle and JalView. A phylogenetic tree based on NCBI taxonomy IDs of host organisms was also generated by using iTOL [11]. dndDB also contains a list of conserved domains and consensus sequences identified in Dnd proteins that have been previously deposited in the protein family database Pfam, the Conserved Domain Database (CCD), and/or the biological macromolecule 3-D structures database PDB [12]. In addition, hundreds of other proteins exhibiting lower levels of similarity to Dnd proteins with Blastp E-values of less than E −4 were extracted from the NCBI nr database and stored in dndDB to allow for rapid identification of more distantly related potential homologues or proteins performing related functions.

We have used dndDB and associated experimentation to analyse the DndA, DndB, DndC, DndD and DndE proteins of S. lividans and have used these data to predict their putative biological functions, thus shedding light on the novel DNA phosphorothioation biochemical pathway. The DndA protein is a likely cysteine desulfur-transferase that is proposed to provide sulphur via its L-cysteine desulfurylase activity (see Figure 3 for an outline of relevant data) [13]. DndB is a predicted Fe-S cluster binding protein, which we hypothesize affects modification specificity through its action as a transcriptional regulator. Similarly, DndC is proposed to contain a [4Fe-4S] cluster and has predicted ATP pyrophosphatase activity, features paralleling those of IscS and ThiI [14] which are involved in tRNA sulfur modification in Escherichia coli. DndD is a putative ATPase with DNA nicking activity which may couple ATP hydrolysis to DndE, a putative sulphur-transferase. However, much more detailed analyses and experimentation will be necessary to finalize the precise nature of the dnd biochemical pathway.

thumbnail

Figure 3. Organization of Dnd proteins and conserved domains in dndDB.

(A) DndA protein data that have been used to predict its putative biological function as a likely cysteine desulfur-transferase in Streptomyces lividans. (B) Multiple amino acid sequence alignment of DndA proteins highlighted the conserved domain in Pfam (accession no. PF00266). (C) and (D) Phylogenetic trees drawn on basis of DndA amino acid sequences and 16S rDNA sequences of the host organisms, respectively. (E) A 3-D structural image corresponding to a DndA-related protein (PDB ID: 1p3w). (G) Sample experimental data demonstrating that DndA provides sulphur via its L-cysteine desulfurylase activity [13]. (F) Inferred biochemical reaction, in which DndA is predicted to catalyze the assembly of DndC as an iron–sulfur cluster protein [13].

doi:10.1371/journal.pone.0005132.g003

Search tools

The dndDB web server offers several search tools with varied options. Through the ‘Search’ page, users can retrieve Dnd phenotype, gene or protein homologues from dndDB by organism name. Via the ‘Blast vs dndDB’ page, users are able to blast a query sequence against dndDB to find homologous matches with WU-BLAST 2.0 [Gish, W., personal communication]. Finally, the ‘tBlastn for Dnd’ page, utilizes a NCBI tBlastn-based tool that we developed to predict potential Dnd proteins in user-supplied nucleotide sequences.

As future developments, we will shortly be uploading a large set of sequences which exhibit homology to isolated dnd genes, as apposed to dnd clusters only, and a further set corresponding to homologues of the full complement of non-dnd genes borne on dnd islands. We will continue to identify additional syntenic clusters, isolated dnd-like genes and other dnd island gene homologues as gene, genome and metagenome databases expand, and anticipate eventually providing a pipeline for ready automated discovery, annotation and analyses of dnd genes, clusters and associated genomic islands.

We envisage an evolving resource that seeks to effectively combine and interlink the genetics, biochemistry and functional aspects of dnd systems and their associated genomic islands. Such a unified resource will facilitate efficient investigation of a wide range of aspects relating to dnd DNA modification processes and other island-encoded functions in diverse host organisms. We also believe that the lessons learnt from ongoing dissection of the dnd system will provide clues to resolve mysteries relating to weakly similar genes, proteins and biochemical reactions, and in due course give rise to novel biotechnological and/or clinical applications; thus we expect that dndDB will prove to be of interest to a broad community of researchers.

Materials and Methods

The dndDB database runs on a Linux platform (Fedora core 5) with the Apache web-server (version 2.2.0), MySQL server (v 5.0.22), PHP (v 5.1.4), Perl (v 5.8.8) and Bioperl (v 2.1) [15]. In addition, the following freely available components were employed: NCBI Blast 2.2.9 [16], WU-BLAST 2.0 [Gish, W., personal communication], Muscle 3.7 [7], GBrowse 1.69 [9] and JalView 2.4 [8]. dndDB version 1.0 is freely available for research activities and non-commercial use at http://mml.sjtu.edu.cn/dndDB/. The Java platform (http://www.java.com/) is required for web browser-based visualisation of Muscle-generated phylogenetic trees using JalView.

The current version of dndDB includes the following information. (i) List of 24 bacterial species exhibiting the Dnd phenotype and associated publications; (ii) Details of dnd gene clusters from 25 species of Eubacteria and Archaea that were identified based on both sequence similarity and gene order (synteny) by employing Blastp searches against complete and partially sequenced genomes available at the NCBI server; (iii) Details of laterally acquired genomic islands harbouring dnd genes that were predicted using the GBrowse viewer (Generic Genome Browser) [9], MobilomeFINDER server [17], Z Curve database online utility [18] and/or interactive Artemis Comparison Tool (WebACT) [19]. (iv) Archive of Dnd proteins and other potentially related proteins showing BLAST-based similarity, and corresponding conserved domains identified in the protein family database Pfam [20] and the Conserved Domain Database (CCD) [21].

dndDB currently contains details of over 114 dnd genes and their cognate proteins from the Eubacterial and Archaeal kingdoms, and is expected to grow quickly with the rapid development of genome sequencing projects and the ongoing refinement of strategies to identify distantly related gene clusters, orphan dnd genes, and functionally or biochemically related proteins. As more information about the Dnd system becomes available, the database will be expanded and improved accordingly.

In addition, brief descriptions of ongoing research into the dnd system by our group and collaborators are also incorporated into dndDB to foster dialogue and participation by the wider research community. These include work on a putative Dnd-dependent restriction-modification system, the precise nature of the DNA modification itself, the core sequence motif that targets the site-specific modification in S. lividans [22], and the increasingly well characterized novel biochemical pathway that mediates this unique biological process. Future contributions from other researchers will be sought via dndDB.

Author Contributions

Analyzed the data: HYO XH KR ZD. Contributed reagents/materials/analysis tools: HYO YS CT. Wrote the paper: HYO.

References

  1. 1. Zhou X, Deng Z, Firmin JL, Hopwood DA, Kieser T (1988) Site-specific degradation of Streptomyces lividans DNA during electrophoresis in buffers contaminated with ferrous iron. Nucleic Acids Res 16: 4341–4352.
  2. 2. Zhou X, He X, Liang J, Li A, Xu T, et al. (2005) A novel DNA modification by sulphur. Mol Microbiol 57: 1428–1438.
  3. 3. Wang L, Chen S, Xu T, Taghizadeh K, Wishnok JS, et al. (2007) Phosphorothioation of DNA in bacteria by dnd genes. Nat Chem Biol 3: 709–710.
  4. 4. Zhou X, He X, Li A, Lei F, Kieser T, et al. (2004) Streptomyces coelicolor A3(2) lacks a genomic island present in the chromosome of Streptomyces lividans 66. Appl Environ Microbiol 70: 7110–7118.
  5. 5. He X, Ou HY, Yu Q, Zhou X, Wu J, et al. (2007) Analysis of a genomic island housing genes for DNA S-modification system in Streptomyces lividans 66 and its counterparts in other distantly related bacteria. Mol Microbiol 65: 1034–1048.
  6. 6. Dobrindt U, Hochhut B, Hentschel U, Hacker J (2004) Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol 2: 414–424.
  7. 7. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797.
  8. 8. Clamp M, Cuff J, Searle SM, Barton GJ (2004) The Jalview Java alignment editor. Bioinformatics 20: 426–427.
  9. 9. Stein LD, Mungall C, Shu S, Caudy M, Mangone M, et al. (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12: 1599–1610.
  10. 10. Wang H, Su Y, Mackey AJ, Kraemer ET, Kissinger JC (2006) SynView: a GBrowse-compatible approach to visualizing comparative genome data. Bioinformatics 22: 2308–2309.
  11. 11. Letunic I, Bork P (2007) Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23: 127–128.
  12. 12. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. (2000) The Protein Data Bank. Nucleic Acids Res 28: 235–242.
  13. 13. You D, Wang L, Yao F, Zhou X, Deng Z (2007) A novel DNA modification by sulfur: DndA is a NifS-like cysteine desulfurase capable of assembling DndC as an iron-sulfur cluster protein in Streptomyces lividans. Biochemistry 46: 6126–6133.
  14. 14. You D, Xu T, Yao F, Zhou X, Deng Z (2008) Direct evidence that ThiI is an ATP pyrophosphatase for the adenylation of uridine in 4-thiouridine biosynthesis. Chembiochem 9: 1879–1882.
  15. 15. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, et al. (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res 12: 1611–1618.
  16. 16. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.
  17. 17. Ou HY, He X, Harrison EM, Kulasekara BR, Thani AB, et al. (2007) MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands. Nucleic Acids Res 35: W97–W104.
  18. 18. Zhang CT, Zhang R, Ou HY (2003) The Z curve database: a graphic representation of genome sequences. Bioinformatics 19: 593–599.
  19. 19. Abbott JC, Aanensen DM, Rutherford K, Butcher S, Spratt BG (2005) WebACT–an online companion for the Artemis Comparison Tool. Bioinformatics 21: 3665–3666.
  20. 20. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, et al. (2008) The Pfam protein families database. Nucleic Acids Res 36: D281–288.
  21. 21. Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, et al. (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35: D237–240.
  22. 22. Liang J, Wang Z, He X, Li J, Zhou X, et al. (2007) DNA modification by sulfur: analysis of the sequence recognition specificity surrounding the modification sites. Nucleic Acids Res 35: 2944–2954.