Reader Comments

Post a new comment on this article

Blog post regarding results

Posted by macmanes on 31 Dec 2013 at 13:11 GMT

I've written a short blog post about this paper, specifically about the interpretation of the results. Would be great to get a response from the authors. Link: http://genomebio.org/is-t...

No competing interests declared.

RE: Blog post regarding results

giorgilab replied to macmanes on 01 Jan 2014 at 01:41 GMT

Thank you Dr. MacManes for the interesting point you raised in your blog.
We do believe Trimming is beneficial for RNASeq, at least under the parameters we measured. As we say, the decision to trim RNASeq reads is usually a tradeoff between the percentage of mapping reads and number of surviving reads. The idea of measuring directly the correctness of mapping (i.e. the number of reads that align correctly) is challenging; however it's really hard to define "true positives" in such a scenario. First of all, we have an intrinsic uncertainty in the genome sequence itself, in the divergence from the reference of the sequenced organism, we may then add various flavours of uncertainty at the level of contaminant reads (environmental, pathogens/symbionts, human operators-derived), library preparation biases and plain sequencing errors. We thought of including multiple concepts of "correct RNASeq", but found that the only unbiased way to assess this is through an assessment of the "mappability" of the surviving reads. In such a way, a higher percentage means the trimmer operates by increasing the such defined "quality" of the surviving population. And once more, it's a matter of tradeoff between quality and quantity (size of post-trimming population of reads).
We agree that a Q20-Q30 hard threshold is not a golden rule for trimming RNASeq reads, and it should be tuned to the overall quality of the dataset (i.e. high quality should require a slighlty more stringent/higher Q threshold).
Also, I believe the effects of trimming will be much more evident if the usage of RNASeq reads is beyond the standard "counting genes" task. E.g. what will be the effect of trimming on SNP calling based on RNA reads? (Useful e.g. for Allele-Specific expression studies). Or for the more widely used transcriptome assembly? We thought of including a comparative analysis for transcriptome assembly quality, but found out with preliminary results that they would look similar to the DNASeq ones. However, we do believe such an extensions of our study is necessary to assess the broad applicability of trimming as a standard procedure for NGS data processing.

Competing interests declared: I am the corresponding author of this paper.

RE: RE: Blog post regarding results

macmanes replied to giorgilab on 01 Jan 2014 at 11:57 GMT

Thanks for the response Federico!

Generally speaking, RNAseq type studies aim to understand something about transcripts, their DNA/AA sequence, abundance, isoform use, to name a few. Which of these outcomes is improved with aggressive trimming? Certainly the percentage of reads mapping may indicate something about the quality of the reference or dataset, but how this metric relates to RNAseq is much more obscure.

Re your point about transcriptome assembly- aggressive trimming is detrimental to assembly. This relationship is pretty clear: http://biorxiv.org/conten...

No competing interests declared.