Reader Comments

Post a new comment on this article

F1000 rating: Exceptional (10)

Posted by cscbio on 25 Feb 2012 at 01:55 GMT

http://f1000.com/13916956...

By Roy Kishony and Morten Ernebjerg, Harvard University

This article provides a groundbreaking approach to addressing a long-standing grand challenge of molecular and computational biology – the prediction of 3D protein structures from primary amino acid sequences – by linking evolutionary constraints on sequence divergence to the physical protein structure.
With the advent of 'cheap and deep' sequencing, thousands of new protein sequences are emerging, many of them from families with unknown structure and no structural homologs {1}. This has brought new urgency to the thorny problem of predicting protein structure computationally without recourse to crystal structures. In this article, the authors make substantial progress towards this goal by using the fact that although a single protein sequence may not offer enough guidance for simple folding methods to work, the extant evolutionary record embodied by families of proteins often does.

The core observation is that residues in close physical proximity in a folded protein tend to co-vary across protein families, reflecting the fact that evolutionary pressures on the protein structure forces such residues to change in a coordinated fashion {2} (a fact that also underlies the exciting discovery that proteins can be decomposed into spatially contiguous evolutionary 'sectors' with distinct functions {4}). To overcome noise and spurious correlations, the authors apply a newly developed method {3} to efficiently translate an aligned protein family into a so-called maximum entropy distribution for all sequences in the family. Encoded in this distribution are linkage strengths between all residue pairs -- linkage strengths that are based on a global model of the entire sequence rather than just the frequencies of the two residues in question. Such global linkages have been shown to provide an excellent guide to physical proximity of residues {2,3}. The key is that knowing which residues are close in space is tantamount to knowing the rough shape of the folded structure. The authors demonstrate that once the predicted residue proximities are encoded as geometrical constraints on the protein structure, standard folding algorithms can provide excellent structure prediction for long sequences (>200 residues) that have previously been out of reach for computational methods.

One of the great achievements of this article is that it successfully combines multiple techniques, observations, and ideas to resolve the major goal of protein folding prediction. It brings together theoretical advances in the calculation of maximum entropy distribution, existing protein folding tools, and new biological ideas to forge a practical structure-prediction pipeline that translates a protein family into a predicted structure, using only the protein sequences as input and requiring only a standard laptop computer. As such, the paper provides a tool that could prove crucial in translating the flood of new sequences into new structures and possibly also predict function.
References:
{1} Yooseph et al. PLoS Biol 2007, 5:e16 [PMID:17355171].
{2} Weigt et al. Proc Natl Acad Sci USA 2009, 106:67-72 [PMID:19116270].
{3} Morcos et al. Proc Natl Acad Sci USA 2011, 108:E1293-301 [PMID:22106262].
{4} Halabi et al. Cell 2009, 138:774-86 [PMID:19703402].

No competing interests declared.