Reader Comments

Post a new comment on this article

Important to point out cases where similarity networks might perform poorly

Posted by jeisen on 13 Feb 2009 at 15:38 GMT

cannot be used as a basis for inferring evolutionary history.
http://plosone.org/article/info:doi/10.1371/journal.pone.0004345#article1.body1.sec2.sec2.p3

I really like the use of similarity networks here for handling, annotating, and visualizing large sequence data sets. However, I think it is important to go a bit beyond the discussion here and point out the cases in which similarity networks might perform poorly. In particular, as I and others have shown for years, similarity based metrics in general perform poorly when there has been significant variation in rates of evolution (e.g., see my 1998 paper on "Phylogenomics" http://genome.cshlp.org/content/8/3/163 as well as many others). Similarity metrics do not perform well here for the reasons you discuss - because they do not have any underlying model of evolution that allows them to incorporate patterns of rate variation. Another situation where similarity metrics can perform poorly is when there is significant homoplasy (e.g. convergent evolution) in the sequences. Many phylogenetic methods are better able to deal with this than similarity metrics.

Again, this is not to say that the networks used here are a bad idea. When I saw a talk by the lead author, we immediately started playing with cytoscape to look at large gene families. But it is important to understand how and why they can give a misleading picture of the underlying sequence clusters and groupings.