Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeCollaboration with tool authors required
Posted by idot on 13 Aug 2009 at 13:40 GMT
I think this article shows, why necessary tool evaluation should be a community effort conducted together with the tool authors themselves. Of course a tool should have sensible defaults But the more versatile a tool is, the more requirements it is able to fulfill, the more esoteric the options become. To really evaluate a tool one would have to study it in more detail. And to properly use a tool the same level of knowledge is necessary.
I just mention the bowtie settings used (I use bowtie myself sometimes to quickly map reads, and I am not affiliated with the authors):
–k 1, -n 3, -e 2000
-n is the max mismatches in the seed(!). -e 2000 selects for a "quality-weighted hamming distance", not a total
number of mismatches. This would have been option -v <int> where I don't think a maximum exists (as the authors state in the paper).
The random assignment of ambigous reads could have been easily turned off with -m 1 as they did with the clc program (-r ignore).
RE: Collaboration with tool authors required
lh3lh3 replied to idot on 14 Aug 2009 at 10:20 GMT
Agreed. To evaluate aligners, one must fully understand how each aligner works. Sometimes even the developer him/herself is not sure about the behavior of his/her own aligner. Let alone others.
For bowtie, one can discard repetitive hits by running it with -m1, or --best -k2 and filter later on. I think it is right to use -e. Using -v 3 or more is inefficient. For maq, one can simply set a threshold on mapping quality to discard repetitive hits.
In addition, in table 2, the authors should map more than 1,000,000 to evaluate the speed. SeqMap and maq are highly inefficient given only 100,000 reads. I do not know how CLC Bio NGS works.