Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective

R. Henrik Nilsson; Martin Ryberg; Erik Kristiansson; Kessy Abarenkov; Karl-Henrik Larsson; Urmas Kõljalg

doi:10.1371/journal.pone.0000059

Loading metrics

Open Access

Peer-reviewed

Research Article

Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective

R. Henrik Nilsson ,

*To whom correspondence should be addressed. E-mail: henrik.nilsson@botany.gu.se

Affiliation Department of Plant and Environmental Sciences, Göteborg University, Göteborg, Sweden
⨯
Martin Ryberg,

Affiliation Department of Plant and Environmental Sciences, Göteborg University, Göteborg, Sweden
⨯
Erik Kristiansson,

Affiliation Department of Mathematical Statistics, Chalmers University of Technology, Göteborg, Sweden
⨯
Kessy Abarenkov,

Affiliation Institute of Botany and Ecology, University of Tartu, Tartu, Estonia
⨯
Karl-Henrik Larsson,

Affiliation Department of Plant and Environmental Sciences, Göteborg University, Göteborg, Sweden
⨯
Urmas Kõljalg

Affiliation Institute of Botany and Ecology, University of Tartu, Tartu, Estonia
⨯

Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective

R. Henrik Nilsson,
Martin Ryberg,
Erik Kristiansson,
Kessy Abarenkov,
Karl-Henrik Larsson,
Urmas Kõljalg

Published: December 20, 2006
https://doi.org/10.1371/journal.pone.0000059

Reader Comments

Post a new comment on this article

Reliability of sequence data in GenBank, EMBL and DDBJ?

Posted by JanS on 15 Jan 2007 at 00:58 GMT

that the reference database features a satisfactory taxonomic sampling of sequences

that the sequences in the reference database are correctly identified and annotated

that the process of translating the comparison into species names is standardized, universally adopted, and not easily misunderstood
http://plosone.org/article/info:doi/10.1371/journal.pone.0000059#article1.body1.sec1.p1

Dear all,

I have enjoyed reading the article and found it a very good treatment of the problems associated with entries in GenBank, EMBL, DDBJ.

In a traditional/old-fashioned museum (with all the jars with strange looking de-pigmented organisms, pinned butterflies, skeletons etc.) real but dead organisms are reliably stored, categorized and annotated by a curator. If there is any doubt about the taxonomic affiliation one can always examine the specimen. Unlike this traditional museum GenBank, EMBL and DDBJ store only DNA sequences + fragmentary annotation of some organism. Unfortunately in majority of cases this organism, is not linked to a repository (= traditional museum) with the dead organism, or in case of cultures a culture repository (i.e. ATCC). GenBank, EMBL, DDBJ has probably no means how to regulate this problem nor the regulation of the depth of annotation of the sequence as presented by Nilsson et al. in their article because it all stands on the individual researcher that submits the data there.

The point that I would like to highlight is an addition to Nilsson et al.’s “assumptions” in their introduction. One has to also assume that the sequence has been generated correctly and has no errors caused by technical issues (not to be confused by polymorphism).

But really, how reliable are the sequence data in the GenBank, EMBL, DDBJ? Here I would like direct your attention to this excellent article, that is worth reading:

Harris DJ (2003) Can you bank on GenBank? Trends in Ecology & Evolution 18(7) 317-319. http://dx.doi.org/10.1016/S0169-5347(03)00150-2

Anyway, I can only agree that more control over the annotation as well as the sequence is needed from the publishers, repositories and last but not least authors.

Cheers, JanS

RE: Reliability of sequence data in GenBank, EMBL and DDBJ?

RHNi replied to JanS on 07 Feb 2007 at 15:45 GMT

>The point that I would like to highlight is an addition to Nilsson
>et al.’s “assumptions” in their introduction. One has to also assume
>that the sequence has been generated correctly and has no errors caused
>by technical issues (not to be confused by polymorphism).

You bring up a good point here. I'd say the “technical quality” of the sequences is just taken for granted most of the time, whereas in reality we have all seen Sequencher or Staden trying to make sense of noisy indata and making equivocal basecalls. (On a sidenote, the sequence with the highest number of IUPAC ambiguities – if I remember correctly – sported a full 85% of them.) I agree that we should probably have been more explicit about the “technical quality” in the Introduction.

Thanks for sharing your thoughts,

Sincerely,

Henrik N

RE: RE: Reliability of sequence data in GenBank, EMBL and DDBJ?

MarkvP replied to RHNi on 30 Jun 2007 at 23:22 GMT

Dear all,

with respect to the 'power of participation' of the (scientific) community, I would like to suggest the paper by S.L. Salzberg in Genome Biology

S.L. Salzberg
Genome re-annotation: a wiki solution?
Genome Biol. 2007;8(1):102

I don't think it's open access: isn't that weird, an open access journal discussing an open access wiki approach, yet with restricted access to the article!?!!

Anyway, I am sure you can get the paper if you try.

Good luck

Mark

RE: RE: RE: Reliability of sequence data in GenBank, EMBL and DDBJ?

RHNi replied to MarkvP on 11 Jul 2007 at 20:05 GMT

Dear Mark,

I have requested the possibility to leave comments for particular INSD entries on multiple occasions, but I never had a satisfactory reply. Such a feature would surely alleviate the concerns with, e.g., misidentified entries: it would be highly useful to read other persons' warnings and reservations on particular entries. I particularly think of the ITS sequences that are submitted as belonging to reindeer but that really are ascomycetes - a word of warning would certainly beneficial here.

I fully agree with you on the absurdity on publishing a plea for open access / contribution style approaches - in a non-open access paper. Maybe the author was less into open access and more into high impact factors after all?

(Genome Biology is partly open access, right?)

Best,

Henrik N

Subject Areas
?

For more information about PLOS Subject Areas, click here.
We want your feedback. Do these Subject Areas make sense for this article? Click the target next to the incorrect Subject Area and let us know. Thanks for your help!

Taxonomy
Is the Subject Area "Taxonomy" applicable to this article?

Thanks for your feedback.
Sequence databases
Is the Subject Area "Sequence databases" applicable to this article?

Thanks for your feedback.
Fungal classification
Is the Subject Area "Fungal classification" applicable to this article?

Thanks for your feedback.
Fungi
Is the Subject Area "Fungi" applicable to this article?

Thanks for your feedback.
BLAST algorithm
Is the Subject Area "BLAST algorithm" applicable to this article?

Thanks for your feedback.
Species delimitation
Is the Subject Area "Species delimitation" applicable to this article?

Thanks for your feedback.
DNA barcoding
Is the Subject Area "DNA barcoding" applicable to this article?

Thanks for your feedback.
Nucleotide sequencing
Is the Subject Area "Nucleotide sequencing" applicable to this article?

Thanks for your feedback.

Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective

Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective

Reader Comments

Post Your Discussion Comment

Why should this posting be reviewed?

Thank You!

Reliability of sequence data in GenBank, EMBL and DDBJ?

Posted by JanS on 15 Jan 2007 at 00:58 GMT

RE: Reliability of sequence data in GenBank, EMBL and DDBJ?

RHNi replied to JanS on 07 Feb 2007 at 15:45 GMT

RE: RE: Reliability of sequence data in GenBank, EMBL and DDBJ?

MarkvP replied to RHNi on 30 Jun 2007 at 23:22 GMT

RE: RE: RE: Reliability of sequence data in GenBank, EMBL and DDBJ?

RHNi replied to MarkvP on 11 Jul 2007 at 20:05 GMT