Reader Comments

Post a new comment on this article

Corrections including the missing references (part 1)

Posted by dettai on 03 Jan 2013 at 11:27 GMT

We propose a novel approach for the isolation and sequencing of a universal, useful and popular marker across distant, non-model metazoans: the complete mitochondrial genome. 
We focus here on an NGS-based approach for the isolation and sequencing of a universal, useful and popular marker across distant, non-model metazoans: the complete mitochondrial genome. 

Our method can yield sequences both from identified samples and metagenomic samples.
The method can yield sequences both from identified samples and metagenomic samples.

Technical issues, like PCR primer universality or reference dataset completeness are a limiting factor ([9], [6]).
Technical issues, like PCR primer universality or reference dataset completeness are a limiting factor ([9], [6], [95]).

We propose here an approach for biodiversity studies based on the intrinsic properties of the metazoan mitochondrial DNA and long established molecular techniques for marker isolation, followed by multiplex sequencing of divergent organisms. Its design permits to link the sequences to the specimen they were obtained from without additional experimental steps, while allowing multiplexing to fully exploit the power of the NGST.
We focus here on an approach for biodiversity studies based on the intrinsic properties of the metazoan mitochondrial DNA and long established molecular techniques for marker isolation, followed by multiplex sequencing of divergent organisms. Its design permits to link the sequences to the specimen they were obtained from without additional experimental steps, while allowing multiplexing to fully exploit the power of the NGST. A similar approach had been proposed for shotgun sequencing in 2000 [94]. To our knowledge, this was the first suggestion of the idea of multiplexing highly divergent samples in order to remove the need for tagging. Their concern was to simplify and accelerate access to genomic sequence diversity by pooling divergent sources for sequencing and then reconstructing genomes afterwards. The recent progresses in sequencing technologies have made this idea effective and competitive, as was demonstrated by a recent study on Coleoptera using long PCRs and 454 sequencing [95].

Yet for all its qualities, a practical approach to exploit it for a wide array of taxa using NGST is missing. Mitogenomes are still mostly being sequenced using Sanger technology and a very large number of primers ([20],[21]), or NGST using tags [22] or one sequencing run per mitogenome [23].
Yet for all the qualities of mitogenomes and despite the publication of [94, 95], practical approaches to exploit it for a wide array of taxa using NGST are not used. Mitogenomes are still mostly being sequenced using Sanger technology and a very large number of primers ([20],[21]), or NGST using tags [22] or one sequencing run per mitogenome [23], with few exceptions [95].

We estimate that the method we propose could reduce the necessary time to sequence a sample by days, and divide the cost by up to a 100 times.
We estimate that the present approach ([94], [95]) could reduce the necessary time to sequence a sample by days, and divide the cost by up to a 100 times.

Their total genomic DNA is extracted, and enriched in mitochondrial DNA.
Their total genomic DNA is extracted, and enriched in mitochondrial DNA [94].

No competing interests declared.

Corrections including the missing references (part 2)

dettai replied to dettai on 03 Jan 2013 at 11:32 GMT

The enriched extract is sequenced directly or amplified and sequenced. Individual mitochondrial genomes can then be assembled, as they are divergent enough to be distinct along their whole length.
The enriched extract is sequenced directly or amplified and sequenced. Individual mitochondrial genomes can then be assembled, as they are divergent enough to be distinct along their whole length [94].

The assembled mitogenomes are linked to their specimen of origin by comparing their COI and/or other mitochondrial marker with the identified sequences present in a reference database, effectively using the sequences themselves as tags for sorting.
The assembled mitogenomes are linked to their specimen of origin by comparing their COI and/or other mitochondrial marker with the identified sequences present in a reference database or sequenced on purpose for this, effectively using the sequences themselves as tags for sorting [95].

We describe here the approach, the type of sequence data it generates, the procedure to recover mitochondrial genomes without external tagging, and some potential uses. We perform an in-silico validation test based on the analysis of a simulated dataset with read lengths of two different sizes to represent average read length of three 2nd generation desktop sequencing platforms, Illumina Mi-Seq, 454 GS junior and Ion Torrent PGM. Thus we can contrast their relative efficiencies for the experimental protocol proposed here.
We precise here the approach, widening it compared to previous studies in the proposed lab approach and the taxonomic scope, while going over the type of sequence data it generates, the procedure to recover mitochondrial genomes without external tagging [94, 95], and some new potential uses. We provide both analysis results and an approach to determine easily when sequences can be safely multiplexed. We perform an in-silico validation test based on the analysis of a simulated dataset with read lengths of two different sizes to represent average read length of three 2nd generation desktop sequencing platforms, Illumina Mi-Seq, 454 GS junior and Ion Torrent PGM, to enlarge the conclusions the 454 case study on 30 Coleoptera species [95]. Thus we can contrast the relative efficiencies of different sequencers for the experimental protocol.

A 100 sequence dataset was assembled on the basis of the p-distance values from the previous analysis (File S1). The dataset was selected from a single taxonomic group (Actinopterygian fish) to replicate a realistic sequencing for a phylogenetic or systematics study in a specialised laboratory.
A 100 sequence dataset was assembled on the basis of the p-distance values from the previous analysis (File S1). The dataset was selected from a single taxonomic group (Actinopterygian fish) to replicate a realistic sequencing for a phylogenetic or systematics study in a specialised laboratory. It tests a larger dataset than [95] species wise (100 vs 30 species), on multiple sequencing platforms.

Our approach is based on local dissimilarity for post-sequencing de-multiplexing of the sequence pool, and therefore requires a careful selection of specimens before multiplexing runs. The goal is to pool samples possessing sufficient divergence all along their mitochondrial genomes, so that every individual sequence can be singled out at the sequence assembly stage, and then identified using the reference data as one of the samples included in the run.
The approach is based on local dissimilarity for post-sequencing de-multiplexing of the sequence pool, and therefore requires a careful selection of specimens before multiplexing runs [94]. The goal is to pool samples possessing sufficient divergence all along their mitochondrial genomes, so that every individual sequence can be singled out at the sequence assembly stage [94, 95], and then identified using the reference data as one of the samples included in the run [95].

These should not be combined in a single sequencing run with the current sequence length and error rates, but with new sequencing techniques with low error rates and very long sequences, it might become possible to sort them.
These should not be combined in a single sequencing run with the current sequence length and error rates (although see [95]), but with new sequencing techniques with low error rates and very long sequences, it might become possible to sort them.

Molecular identification ease and quality relies on the representation in a reference dataset ([9], [6]).
Molecular identification ease and quality relies on the representation in a reference dataset ([9], [6], [95]).

Alternatively, a few sequences for partial COI or ribosomal markers could be sequenced trough Sanger technique for species with no reference in databases to serve as reference for demultiplexing.
Alternatively, a few sequences for partial COI or ribosomal markers could be sequenced trough Sanger technique for species with no reference in databases to serve as reference for demultiplexing [95].

No competing interests declared.

Corrections including the missing references (part 3)

dettai replied to dettai on 03 Jan 2013 at 11:35 GMT

Even with an efficiency and a number of mitogenomes per sequencing one order of magnitude lower than what we calculated here, the cost of a complete mitogenome would be considerably lower than the current PCR and Sanger-sequencing based approach.
Even with an efficiency and a number of mitogenomes per sequencing one order of magnitude lower than what we calculated here, the cost of a complete mitogenome would be considerably lower than the current PCR and Sanger-sequencing based approach, as had been already suggested even using cloning and shotgun sequencing [94], and incredibly faster [95].

In these cases, tagging smaller pools of multiplexed mitogenomes would involve tractable numbers of samples without having to tag thousands of samples separately.
In these cases, tagging smaller pools of multiplexed mitogenomes would involve tractable numbers of samples without having to tag thousands of samples separately [95].

However, processes for enrichment or specific isolation of mitochondrial DNA versus nuclear DNA have been mastered for decades, using the physical properties of the mitochondrial genomes, especially size and composition.
However, processes for enrichment or specific isolation of mitochondrial DNA versus nuclear DNA have been mastered for decades, using the physical properties of the mitochondrial genomes, especially size and composition, and can be used in this approach [94].

Assembly can be based either on the existing complete mitogenome datasets [27], or using de novo assemblers ([27], [46], [47]), depending on the type of sequence output. The risk of recovering chimeric sequences combining several mitogenomes is low, first because of the preliminary choice of the specimens, then because considerable work has been done on allele separation and identification for diploid genomes [7], and settings can be fine tuned to get the best results
Assembly can be based either on the existing complete mitogenome datasets [27], or using de novo assemblers ([27], [46], [47], [95]), depending on the type of sequence output. The risk of recovering chimeric sequences combining several mitogenomes is low, as demonstrated by [95] even with closely related species, first because of the preliminary choice of the specimens [94], then because considerable work has been done on allele separation and identification for diploid genomes [7], and settings can be fine tuned to get the best results.

Building contigs overlapping large parts, or even whole mitogenomes, appears to be a reachable goal when coverage is sufficient. These coverage values are in agreement with the coverage cited for other types of genome sequencing.
Building contigs overlapping large parts, or even whole mitogenomes, appears to be a reachable goal when coverage is sufficient. Real 454 data generated using long PCRs could indeed be recovered in long contigs [95]. The coverage values listed here are in agreement with the coverage cited for other types of genome sequencing.

While these simulations, intended to demonstrate a practical use in a specialised lab, used a dataset of actinopterygian teleost genomes, we are confident that similar results would be obtained using sequences from other groups, as the assembly depends chiefly on sequence divergence (explored in the sliding window analyses for a large diversity of phylums) and sequence length, which is imposed by the choice of sequencer rather than of the taxa.
While these simulations, intended to demonstrate a practical use in a specialised lab, used a dataset of actinopterygian teleost genomes, and the real data test used insects genomes [95], we are confident that similar results would be obtained using sequences from other groups, as the assembly depends chiefly on sequence divergence ([94], and explored in the sliding window analyses for a large diversity of phylums) and sequence length, which is imposed by the choice of sequencer rather than of the taxa.

Linking the sequences with a precise specimen is advisable not only for the general quality of the data, but also so that the sequences can be used for intraspecific studies, as well as when cryptic species might be present [53].
Linking the sequences with a precise specimen is advisable not only for the general quality of the data [94], but also so that the sequences can be used for intraspecific studies, as well as when cryptic species might be present [53].

Moreover, there are tens of thousands of species represented for not only COI, but also other mitochondrial markers, chiefly cytochrome b, 12S and 16S rRNA.
Moreover, there are tens of thousands of species represented for not only COI, but also other mitochondrial markers, chiefly cytochrome b, 12S and 16S rRNA, and additional sequences can be generated by Sanger sequencing if needed [95].

The analysis methods of the datasets can go from the use of partial sequence analysis for identification purposes (barcoding approaches) to fully fledged phylogenetic analyses on long sequence alignments, gene order coding studies and functional comparative analyses. The usefulness of the mitochondrial markers, and of the complete mitogenome for systematics has been recurrently demonstrated over the years ([13], [21], [58], [59], [60])
The analysis methods of the datasets can go from the use of partial sequence analysis for identification purposes (barcoding approaches) [95] to fully fledged phylogenetic analyses on long sequence alignments, gene order coding studies and functional comparative analyses [94], [95]. The usefulness of the mitochondrial markers, and of the complete mitogenome for systematics has been recurrently demonstrated over the years ([13], [21], [58], [59], [60], [94], [95])

No competing interests declared.

Corrections including the missing references (part 4, last)

dettai replied to dettai on 03 Jan 2013 at 11:35 GMT

Non-binding or low binding efficiency of primers to some target DNAs because of sequence divergence is a serious problem for all mitochondrial markers, jeopardizing amplification and study of some samples and some groups and requiring multiple primers combinations ([70], [71], [20], [6]).
Non-binding or low binding efficiency of primers to some target DNAs because of sequence divergence is a serious problem for all mitochondrial markers, jeopardizing amplification and study of some samples and some groups and requiring multiple primers combinations ([70], [71], [20], [6], [95]).

The approach we propose here has the potential to open or speed up very wide fields of research, harnessing new technologies to benefit biodiversity studies.
This approach has the potential to open or speed up very wide fields of research, harnessing new technologies to benefit biodiversity studies [94], [95].

With good coverage, it is possible to recover large contigs for a number of the sequenced mitogenomes, at least in the simulations, even without equimolarity of the samples. However, our simulations and assemblies show that existing programs for genome assembly lack some important features when applied to mitochondrial genomes. The circularity of mitochondrial genomes is not yet taken into account, which could lead to reconstruction problems.
With good coverage, it is possible to recover large contigs for a number of the sequenced mitogenomes [94], [95], both in the simulations and in tests [95], even without equimolarity of the samples. However, our simulations and assemblies show that existing programs for genome assembly lack some important features when applied to complete mitochondrial genomes. The circularity of mitochondrial genomes is not yet taken into account, which could lead to reconstruction problems. These problems are not present for mitochondrial sequence segments obtained by long PCRs [95], but then identifying sequences datasets are needed in all the PCR segments.

While this approach is not appropriate for all situations, it provides a solution for a large number of cases that were until now technically problematic. There are also economic considerations, as the approach does not require sample tagging, which can represent a considerable part of the sequencing cost. The approach is not sequencer or kit specific, and can be adapted to the availability of each, although the sequencers generating longer sequences will give better results even at lower coverages, as will the use of paired ends.
While this approach is not appropriate for all situations, it provides a solution for a large number of cases. There are also economic considerations, as the approach does not require sample tagging, which can represent a considerable part of the sequencing cost and hassle [95]. The approach is not sequencer or kit specific, as with former [94] or potential future technologies, and can be adapted to the availability of each, although the sequencers generating longer sequences will give better results even at lower coverages, as will the use of paired ends.

The need to mix numerous samples with divergent sequences makes our proposal of little interest for research groups working on a small number of specimens and/or closely related taxa.
The need to mix numerous samples with divergent sequences makes this approach of little interest for research groups working on a small number of specimens and/or closely related taxa (although see [95]).

For studies based on degraded DNA, availability of reference identification datasets covering the whole mitogenome would provide sequence data to explore alternative markers more suited to each group, and help in the development of primers for divergent groups [93].
For studies based on degraded DNA, availability of reference identification datasets covering the whole mitogenome [95] would provide sequence data to explore alternative markers more suited to each group, and help in the development of primers for divergent groups [93].

94. Pollock DD, Eisen JA, Doggett NA, Cummings MP. 2000. A case for evolutionary genomics and the comprehensive examination of sequence biodiversity. Mol Biol Evol 7: 1776-1788.
95. Timmermans MJ, Dodsworth S, Culverwell CL, Bocak L, Ahrens D, Littlewood DT, Pons J, Vogler AP (2010) Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics. Nucleic Acids Res 38(21): e197. doi: 10.1093/nar/gkq807.

No competing interests declared.