Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeImportant to point out cases where similarity networks might perform poorly
Posted by jeisen on 13 Feb 2009 at 15:38 GMT
cannot be used as a basis for inferring evolutionary history.
http://plosone.org/article/info:doi/10.1371/journal.pone.0004345#article1.body1.sec2.sec2.p3
I really like the use of similarity networks here for handling, annotating, and visualizing large sequence data sets. However, I think it is important to go a bit beyond the discussion here and point out the cases in which similarity networks might perform poorly. In particular, as I and others have shown for years, similarity based metrics in general perform poorly when there has been significant variation in rates of evolution (e.g., see my 1998 paper on "Phylogenomics" http://genome.cshlp.org/content/8/3/163 as well as many others). Similarity metrics do not perform well here for the reasons you discuss - because they do not have any underlying model of evolution that allows them to incorporate patterns of rate variation. Another situation where similarity metrics can perform poorly is when there is significant homoplasy (e.g. convergent evolution) in the sequences. Many phylogenetic methods are better able to deal with this than similarity metrics.
Again, this is not to say that the networks used here are a bad idea. When I saw a talk by the lead author, we immediately started playing with cytoscape to look at large gene families. But it is important to understand how and why they can give a misleading picture of the underlying sequence clusters and groupings.