Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

Holly J. Atkinson; John H. Morris; Thomas E. Ferrin; Patricia C. Babbitt

doi:10.1371/journal.pone.0004345

Loading metrics

Open Access

Peer-reviewed

Research Article

Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

Holly J. Atkinson,

Affiliations Graduate Program in Biological and Medical Informatics, University of California San Francisco, San Francisco, California, United States of America, Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California, United States of America
⨯
John H. Morris,

Affiliation Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America
⨯
Thomas E. Ferrin,

Affiliations Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America, Department of Biopharmaceutical Sciences, University of California San Francisco, San Francisco, California, United States of America, Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California, United States of America
⨯
Patricia C. Babbitt

* E-mail: babbitt@cgl.ucsf.edu

Affiliations Department of Biopharmaceutical Sciences, University of California San Francisco, San Francisco, California, United States of America, Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America, Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California, United States of America
⨯

Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

Holly J. Atkinson,
John H. Morris,
Thomas E. Ferrin,
Patricia C. Babbitt

Published: February 3, 2009
https://doi.org/10.1371/journal.pone.0004345

Reader Comments

Post a new comment on this article

Important to point out cases where similarity networks might perform poorly

Posted by jeisen on 13 Feb 2009 at 15:38 GMT

cannot be used as a basis for inferring evolutionary history.
http://plosone.org/article/info:doi/10.1371/journal.pone.0004345#article1.body1.sec2.sec2.p3

I really like the use of similarity networks here for handling, annotating, and visualizing large sequence data sets. However, I think it is important to go a bit beyond the discussion here and point out the cases in which similarity networks might perform poorly. In particular, as I and others have shown for years, similarity based metrics in general perform poorly when there has been significant variation in rates of evolution (e.g., see my 1998 paper on "Phylogenomics" http://genome.cshlp.org/content/8/3/163 as well as many others). Similarity metrics do not perform well here for the reasons you discuss - because they do not have any underlying model of evolution that allows them to incorporate patterns of rate variation. Another situation where similarity metrics can perform poorly is when there is significant homoplasy (e.g. convergent evolution) in the sequences. Many phylogenetic methods are better able to deal with this than similarity metrics.

Again, this is not to say that the networks used here are a bad idea. When I saw a talk by the lead author, we immediately started playing with cytoscape to look at large gene families. But it is important to understand how and why they can give a misleading picture of the underlying sequence clusters and groupings.

Subject Areas
?

For more information about PLOS Subject Areas, click here.
We want your feedback. Do these Subject Areas make sense for this article? Click the target next to the incorrect Subject Area and let us know. Thanks for your help!

Sequence alignment
Is the Subject Area "Sequence alignment" applicable to this article?

Thanks for your feedback.
Phylogenetic analysis
Is the Subject Area "Phylogenetic analysis" applicable to this article?

Thanks for your feedback.
BLAST algorithm
Is the Subject Area "BLAST algorithm" applicable to this article?

Thanks for your feedback.
G protein coupled receptors
Is the Subject Area "G protein coupled receptors" applicable to this article?

Thanks for your feedback.
Network analysis
Is the Subject Area "Network analysis" applicable to this article?

Thanks for your feedback.
Multiple alignment calculation
Is the Subject Area "Multiple alignment calculation" applicable to this article?

Thanks for your feedback.
Hidden Markov models
Is the Subject Area "Hidden Markov models" applicable to this article?

Thanks for your feedback.
Protein domains
Is the Subject Area "Protein domains" applicable to this article?

Thanks for your feedback.

Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

Reader Comments

Post Your Discussion Comment

Why should this posting be reviewed?

Thank You!

Important to point out cases where similarity networks might perform poorly

Posted by jeisen on 13 Feb 2009 at 15:38 GMT