Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Topological Framework for the Computation of the HOMFLY Polynomial and Its Application to Proteins

  • Federico Comoglio,

    Current address: Department of Biosystems Science and Engineering (D–BSSE), ETH Zurich, Basel, Switzerland

    Affiliation Department of Chemical, Food, Pharmaceutical and Pharmacological Sciences (DiSCAFF), University of Piemonte Orientale “Amedeo Avogadro”, Novara, Italy

  • Maurizio Rinaldi

    rinaldi@pharm.unipmn.it

    Affiliation Department of Chemical, Food, Pharmaceutical and Pharmacological Sciences (DiSCAFF), University of Piemonte Orientale “Amedeo Avogadro”, Novara, Italy

Abstract

Polymers can be modeled as open polygonal paths and their closure generates knots. Knotted proteins detection is currently achieved via high-throughput methods based on a common framework insensitive to the handedness of knots. Here we propose a topological framework for the computation of the HOMFLY polynomial, an handedness-sensitive invariant. Our approach couples a multi-component reduction scheme with the polynomial computation. After validation on tabulated knots and links the framework was applied to the entire Protein Data Bank along with a set of selected topological checks that allowed to discard artificially entangled structures. This led to an up-to-date table of knotted proteins that also includes two newly detected right-handed trefoil knots in recently deposited protein structures. The application range of our framework is not limited to proteins and it can be extended to the topological analysis of biological and synthetic polymers and more generally to arbitrary polygonal paths.

Introduction

The topological study of biological polymers has led to important insights into their structural properties and evolution [1], [2]. From a topological point of view polymers can be naturally modeled as sequences of 3D points, i.e. open polygonal paths. Their closure generates classical objects in topology called knots. The simplest knot is the trefoil knot, illustrated in Figure 1A. The characterization of knotted proteins, due to their close structure-function relationship and reproducible entangled folding, is a subject of increasing interest in both experimental and computational biology.

thumbnail
Figure 1. A knot diagram and illustration of the Conway skein triple.

(A) Three dimensional polygonal representation of the trefoil knot (in red) and its planar diagram (in black). Two red spheres on the knot mark the 3D points and projecting down to on the planar diagram along the brown arrow. (B) The Conway skein triple is composed of three oriented diagrams that are the same outside a small region, where they look like the illustrated , and . To define the oriented sign of a crossing, approach it along the underpass in the direction of the orientation: if the overpass orientation runs from left to right, the oriented sign is , otherwise.

https://doi.org/10.1371/journal.pone.0018693.g001

Knots investigation was initially fostered by the discovery of knotted circular single-stranded DNA [3] and has been followed by the study of the underlying enzymatic mechanisms [4], [5] and more recently by the description of the topological organization and packing dynamics of bacteriophage P4 genome [6], [7].

Despite those great advances in knotted DNA studies, we are only beginning to go deeper into protein knots characterization and the understanding of their biological role. After the pioneering work of Mansfield [8] and the definition of topological descriptors for the analysis of protein symmetries and proteins classification [9][11], the detection of knots in proteins was boosted by Taylor's work [12]. The exponential growth of the total number of structures deposited into the Protein Data Bank (PDB, http://www.pdb.org) [13] requires dedicated computational high-throughput methods able to deal with a large amount of data [14]. These methods combine a structure reduction scheme of a protein backbone model with the computation of a knot invariant, the Alexander polynomial [9], [15][17]. Hereinafter with the term reduction we refer to a stepwise deletion of a certain number of points from the original structure (endpoints excluded) that preserves its ambient isotopy class.

The most affirmed reduction algorithm is the KMT reduction scheme. KMT owes its name to the different algorithms proposed by Koniaris and Muthukumar [18] and Taylor [12], [19]. Since the use of this acronym has engendered a little confusion on which algorithm is precisely being used in literature we will explicitly refer to them by authors' names. Globally, these methods are based on the concept of elementary deformation [20], [21], which consists in the replacement of two sides of a triangle with the third provided that the triangle is empty. In particular while Koniaris and Muthukumar's algorithm essentially reproduces the ideas of Alexander-Briggs and Reidemeister, in the Taylor's algorithm (which Taylor himself considers a smoothing algorithm) the elementary deformation is done in steps that progressively smooth the chain at the cost of introducing points not belonging to the protein backbone; the edge replacement depends on some selected conditions [19] chosen to prevent numerical problems.

Once the reduction has been accomplished knot type identification can be performed. This can be done either by visual inspection or by computing a polynomial invariant. Being easy to compute the Alexander polynomial represents the current default choice. This is also supported by the evidence that protein knots detected to date are the simplest ones as illustrated in Figure 2. Unfortunately, the Alexander polynomial does not distinguish a knot from its mirror image. Thus, for instance left- and right-handed trefoil knots share the same polynomial. Instead, more powerful invariants are able to determine knots chirality.

thumbnail
Figure 2. Knots met in proteins.

Illustration of the knots found in proteins, labeled according to Rolfsen names. U: the simplest knot, the unknot. : the trefoil knot and its mirror image, denoted by the , has three crossings. : the figure-eight knot is the only knot with four crossings. : the three-twist knot has five crossings. : the Stevedore's knot, the most complex knot detected in proteins.

https://doi.org/10.1371/journal.pone.0018693.g002

Whereas to define the handedness of the simplest knot types is straightforward, its extension to more complex knots requires carefulness. However, for the purpose of this article, a knot is chiral if its mirror image and the knot itself belong to two different ambient isotopy classes and it is achiral otherwise. We define the handedness of knots according to [22] adopting the conventional values reported in the Atlas of Oriented Knots and Links (http://at.yorku.ca/t/a/i/c/31.htm).

As far as proteins are concerned, the handedness of protein knots was only partially addressed so far.

Taylor points out the existence of both right- and left-handed trefoil knots, with a neat right-handed preference [2]. This hypothesis was supported by the finding that all trefoil knotted proteins belong to the SCOP [23] class, where an intrinsic right-handed preference for unit connections exists. The only left-handed trefoil knot was detected in the ubiquitin C-terminal hydrolases (1 cmx) considered afterwards as an incomplete five crossings knot. However, by considering individual fragments the knot vanishes. A more recent work that removed sequence redundancy, intriguingly highlights a global 5 to 3 balance between right-handed and left-handed knots, not suggesting a bias for one of the two hands [24].

In order to compute invariants able to cope with knots chirality, here we propose a novel topological framework to compute arbitrary skein polynomials. A skein polynomial respects the skein relation:(1)which is an algebraic relation connecting the configurations in a Conway skein triple [25] (see Figure 1B), namely it verifieswhere the coefficients have to satisfy some relations. For instance, the choice , leads to the HOMFLY polynomial [26]. By further specializing and one obtains the Jones polynomial whereas setting and leads to the Alexander polynomial . As far as proteins are concerned, the handedness of protein knots was previously addressed by King et al. [27] and relies on the computation of the Jones polynomial.

Although this appears to be enough to define the chirality of the currently detected knotted proteins, the HOMFLY polynomial is more powerful. For instance, whereas the Jones polynomial is the same for knots 10-022 and 10-035 of the Rolfsen table, the HOMFLY polynomial is able to discriminate them. In the realm of our method, other choices bring to the Vassiliev knots invariants [28], [29] considered for instance by [30].

Generally, the skein relation does not preserve the multiplicity of a link. For example if is a knot, will be a two components link. The recursion of the skein relation together with the values of the given polynomial on the unknot allows to reconstruct the polynomial of any given link. Therefore, the complexity of the polynomial computation grows exponentially with the number of crossings to be processed. Our algorithm relies on the iteration of the skein relation and explicitly constructs the Conway skein triple associated to a given crossing by a stepwise insertion of auxiliary points.

In order to deal with multi-component links and speed up computations, the polynomial computation is preceded by the application of a structure reduction scheme, which we call MSR (Minimal Structure Reduction). The MSR algorithm exploits the interplay between the 3D structure and the corresponding 2D planar diagram of a polygonal path and basically relies on a 3D operation, namely the Generalized Reidemeister Move (GRM). While the Alexander-Briggs method intrinsically removes at most one point at each step, a GRM does not necessarily operate locally, usually leading to a dramatic reduction of the number of points in few steps.

The effectiveness and robustness of the proposed framework were initially evaluated on tabulated knots and links, leading to an HOMFLY polynomial repository along with knots orientation details. We then applied our methods to protein structures. By screening the entire PDB (version of November 8, 2010), we obtained an up-to date table of knotted structures that also includes two newly detected right-handed trefoil knots.

Methods

Basic concepts and definitions

To make this article self-contained, herein we introduce and briefly describe basic concepts and definitions.

  • Polygonal paths A pair where is a collection of points in and is an ordered subset of (the integers in ) with determines a collection of polygonal paths in as follows: the -th path (or component) is generated by connecting the points indexed by .The edges of the polygonal paths are the oriented segments with .
  • A collection of polygonal paths in is simple if each edge of the path intersects precisely the previous and the next edge at the endpoints [31].
  • Polygonal link A collection of simple polygonal paths is a polygonal link. The components of are not necessarily closed. For the sake of convenience, a subpath will be defined by indexing with square brackets.
  • Regular Projection A projection of a polygonal link is regular if the following conditions are satisfied:
    1. The image has at most a finite number of double points (crossings).
    2. No vertex is a double point.
      A link diagram is a regular projection of the link whose graphical representation adopts solid edges and gaps to indicate overcrossings and undercrossings respectively (see Figure 1A). With a slight abuse of language we will also call under/over crossings the points in that project to an over/under crossing in .
  • Intersection signs Given two sets of edges and we can compute the intersection matrix by setting(2)If we get an antisymmetric square matrix and we can simplify the notation to . Intersection signs definition is detailed in Text S1.
  • Minimal structure A minimal structure for a polygonal link is a nested sequence of subsets of that cannot be extended. Each inclusion corresponds to a Generalized Reidemeister Move, described below.

Structure reduction algorithm

Our reduction algorithm MSR iteratively exploits the subroutine GRM, which performs a Generalized Reidemeister Move according to the following scheme:

Step 1: Move candidate selection, namely a subpath of .

Step 2: Move contraction , which is the provisional replacement in of with the segment connecting the endpoints of .

Step 3: Check that and belong to the same ambient isotopy class. If so, the replacement described in Step2 becomes effective.

While the first two steps are trivial, Step3 requires the study of the intersections of the move candidate with the remainder of . is characterized by its initial and final edge indices, respectively and and belongs to a specified component, say of .

The complement can be splitted in , the link components different from and , the open link with at most two components given by and . Let be the set of signs of and analogously be the set of signs of .

The topological check in Step3 requires the evaluation of the three following conditions:

() is ascending or descending (Triviality of ).

() contains at most one element (Separability of from ).

() The set contains at most one element (Concordance of and with respect to ).

If conditions hold, we call the replacement of with (and vice versa) a Generalized Reidemeister Move. A GRM is an equivalence relation for polygonal links. An example of an admissible move is illustrated in Text S2.

Given a polygonal link , its intersections matrix and the move initial index , the GRM algorithm performs the following operations:

The key point of the algorithm is the construction of the intersection matrix from (line 16) simply by replacing the rows and columns of with the vectors and respectively. Notably, this procedure greatly reduces the computational cost with respect to an explicit matrix computation.

We are now ready to introduce MSR. Given a polygonal link and an iteration limit (suitable to achieve a partial reduction) MSR operates as follows:

Skein polynomials computation

In the following the interplay between three and two dimensions plays a fundamental role and it is realized through the standard projection . Since restricted to is invertible up to a finite number of double points, we denote with an uppercase letter objects of and with the corresponding lowercase letter their projection. Counter images of double points are distinguished by subscripts. Obviously, any subpath in the projection has a unique lift to and therefore in the following we adopt a two dimensional description.

Given a polygonal oriented link, we consider two oriented edges and such that their projections and cross at a point . For the sake of convenience we assume that lays under and we respectively denote by and their points projecting down to . The edges and give rise to a skein configuration of type or .

We implemented the Skein Relation on the 3D structure of by construction of the corresponding skein configurations and . With we refer to the switching of the crossing under consideration. Our algorithm performs the following steps (illustrated in Figure 3):

Step 1: Construct an empty quadrilateral containing whose vertices belong to and .

Step 2: Rotate in 2D to get and provisionally change getting (by means of the just introduced lift operation).

Step 3: Check that and are topologically equivalent.

thumbnail
Figure 3. Example of geometric construction of the skein configurations.

(A) Figure-eight polygonal knot diagram. Knot orientation and the crossing between the edges and are shown. (B) A clean quadrilateral around is shown in red. (C) The rotated quadrilateral (solid blue lines) is obtained by rotating (dashed red lines) along the axis. (D) Triangles to be analyzed in the topological check are shaded in green. The points and are reported respectively in red and blue. (E) The configuration, with the path highlighted in black (F) The configuration. Solid lines highlight new connections (in red) and (in blue).

https://doi.org/10.1371/journal.pone.0018693.g003

Quadrilateral Construction.

The edges and are divided in two cut edges by the crossing (see Figure 3A). We construct a quadrilateral with vertices on the four cut edges such that it contains no other edges of the polygonal link projection (clean quadrilateral, see Figure 3B). We consider the four parametric half lines with parameters , leaving from along the four cut edges

For a given value of the parameter vector we get vertices of a quadrilateral . The vertices follow the order . To construct a clean quadrilateral we proceed as follows:

  1. Initialize by setting each .
  2. Construct the quadrilateral and compute the list of distances .
  3. Check the cleanness of via the Xclean algorithm (described below).
  4. If is not clean, consider and iteratively reduce by half the parameter associated with the longest cut edge having intersections (which we call ).

Xclean algorithm: Given an oriented -polygon and a polygonal link we can construct a table of status of the vertices. Each row of is a pair summarizing the intersections of the side entering and leaving the vertex as follows: we assign 0 if the relevant side has no intersections with and 1 otherwise.

Xclean needs a given quadrilateral , a link projection, a table (the putative status list) and a set indexing the vertices whose relevant sides have to be checked. The algorithm simply recomputes the indexed rows of and updates subsequently the adjacent rows.

Quadrilateral Rotation.

As a result of the previous algorithm we end up with a clean quadrilateral , whose vertices lie on and . By inserting in the lift of these vertices as auxiliary points we will run into technical problems due to parallel edges. To overcome this problem we generate a new quadrilateral by rotating of a suitable angle around (Figure 3C) via the the following steps:

  1. Set equal to the minimum angle between the vectors and .
  2. Initializewhere is chosen such that an edge (e.g. ) does not bridge the starting position of the other edge (e.g. ).
  3. Construct .
  4. Check the cleanness of through the Xclean algorithm.
  5. If not, iteratively reduce by half until become clean. Given we can construct by considering the triangle (see Figure 3D) and replacing the original cut edges with the path (two-side replacement), with

Topological Check.

The feasibility of the replacement of with is not obvious and requires a careful check, which is accomplished analyzing the newly introduced connections. The triangle is subdivided in two triangles by the segment . The absence of intersections in the segments and is guaranteed by the cleanness of and .

We approve the two-side replacement if and only if:

  1. The edge has no intersections.
  2. The segments and intersect the same edges of preserving intersections order and signs.

Otherwise the rotation angle is reduced by half and we loop back to Step2.

Construction of the Skein Configurations.

The construction of the skein configurations requires a distinction between and .

To construct we initially take the specular image of the undercross with respect to the overcross . By replacing the edge with the path we obtain a switched crossing but the projection is not regular anymore. Thus, we slightly perturb by attracting it toward via the formula

The constraint on guarantees that the projection of has no intersections with , while the projection has one intersection with but it is not always an overpass. If not, we reduce the perturbation via the iterative formulawhose convergence to 1 guarantees that we will eventually obtain an overpass. We set the initial value to 0.9.

Given , to construct we replace in the edge with the path (see Figure 3E). Notice that the edge is not affected by this construction.

Instead, the construction of make a full use of by substituting in the edges and with the connections and (Figure 3F). Obviously, this determines a shift of the separator indices and of the numbering of the points following . The case where and belong to the same component of is treated differently from the case where they belong to different components. In the former, the number of components of the link increases while in the latter it decreases.

Skein recursion.

We will apply recursively the skein relation (1) to reduce a given polygonal link to a collection of trivial links, systematically switching the undercrossings.

We adopt a greedy approach in which at each recursion we switch the undercrossing leading to the structure with the lowest number of points and we accordingly produce the relevant configuration.

In order to speed up computations, at each step the configurations are reduced with MSR. The resulting structures are stored as nodes in a skein tree, a binary ordered tree rooted at the original link.

Our goal is to assign to every node a pair of weights where is precisely the skein sign of the crossing of to be switched and is the link polynomial of . Notice that while is known, needs to be computed. We adopt a dynamic bottom-up procedure in which starting from leaves we attach to inner nodes.

Leaves are the simplest nodes since given a leaf , is known a priori being the polynomial of the -components unlink and there is no undercrossing left (). In the skein tree, every inner node has two children, say and , and can be computed via the recursion formula

In this way, the polynomial is simply the weight of the root.

Results and Discussion

Validation on tabulated knots and links

Initially, we validated our methods by computing the HOMFLY polynomial of both full structures and minimal stickies representations of tabulated polygonal knots and links. We compared our results with a polynomial repository constructed as described in Text S1. Since standard repositories do not address orientation and chirality, a single polynomial is associated to a given structure and a computed polynomial could not directly match repository entries. Thus, for each tabulated structure we considered mirror images along with all possible orientations (together referred to as flips) and computed the corresponding polynomials. At least one of them matched the one reported in the polynomial repository. Our complete repository of knots up to 10 crossings and oriented links up to 4 components could be browsed at http://www.pharm.unipmn.it/rinaldi/knots/index.php.

As described above, our HOMFLY polynomial computation associates a skein tree to every knot or link, by means of a greedy selection of the crossing to be switched. To verify the goodness of this choice we compared it with a fixed choice variant, which systematically switches the first -1 crossing encountered. We applied both algorithms to every knotted structure in the repository (including flips), characterizing each tree with two complexity indices, namely the level (corresponding to the number of generations, ) and the number of tree nodes . Figure 4 shows the behavior of as a function of , with dashed curves representing theoretical constraints. The growth curves of the two algorithms obtained via ANCOVA after linearization are significantly different, showing that the greedy algorithm performs generally better than the fixed choice one. This result is also supported by the evidence that the number of levels and configurations required for polynomial computation is significantly lower for the greedy choice (Wilcoxon test on the pairwise differences, ). Notably, the shrinking of the tree well compensates the extra computational time required by the greedy choice and this particularly suggest the usage of this algorithm as structure complexity increases. In general, it is possible to find a time threshold such that by filtering computational times accordingly, a significant difference emerges supporting greedy choice. This suggested the adoption of the greedy algorithm for the reduction of protein structures.

thumbnail
Figure 4. The Increase of the number of tree nodes as a function of tree levels.

Trees of both greedy (white/black) and fixed choice (gray) algorithms have been clustered according to the number of levels (). For each cluster a box plot of the nodes number has been drawn with a width proportional to the cluster size. Solid power curves fit the reported data. Dashed red and blue curves represent respectively lower and upper estimates of node numbers. Curve expressions are shown in the legend.

https://doi.org/10.1371/journal.pone.0018693.g004

Application to protein structures

We applied our algorithms to all the protein structures deposited in the PDB. Each entry was preprocessed as described in Methods and the HOMFLY Polynomial was computed on the MSR reduced structures.

Globally, we found 119 knotted proteins (226 parts) of the five knot types shown in Figure 2, belonging to the ten previously well defined classes of knotted foldings [14], [24]. A summary table of knots for each knot type along with the relevant HOMFLY polynomial is reported in Table 1. For a complete list of knotted proteins ID and part details see Table S1.

Although redundancies with previous studies [14], [16], [24] are largely present, the number of knotted proteins is lower than what previously reported. This is mainly due to topological checks and distance controls (see also Text S1) that allowed to discard nonstandard PDB formats and entries having large structural gaps due to missing residues. These proteins are often detected as knotted when gaps are connected by straight lines, inducing artificial entanglement.

thumbnail
Table 1. Total knotted entries detected for each knot type.

https://doi.org/10.1371/journal.pone.0018693.t001

Among newly detected knotted proteins, two right-handed trefoil knots were identified in two recently deposited structures. The first one has been found in the human Carbonic Anhydrase VII (CA7), isoform 1 (3 mdz) (see Figure 5A), whereas second one has been detected in the uncharacterized ORF from Sulfolobus Islandicus rudivirus 1 (2x4i) (Figure 5B), a virus of the extremely thermophilic archaeon Sulfolobus. Notably, although the latter protein still needs to be fully characterized to define its relevance, it shares more than 50% of its primary sequence with protein B116 (2j85) of Sulfolobus turreted icosahedral virus, which King et al [27] previously reported to contain a slip-knot. Thus, it is not surprising that the structure of 2x4i also contains a slip-knot, as we confirmed by visual inspection. Moreover, this protein presents a gap toward its C-terminus. Since we treat gaps as chain terminators (see Text S1) what we have detected is the knotted core of the slip-knot, illustrated in Figure 5B. The trefoil knot in the CA7 belongs instead to the well known right-handed trefoil knotted Carbonic Anhydrase superfamily. Knotted core analysis, performed as reported in [12], [17], reveals that both knots have a quite shallow nature. While a trimming of 28 and 5 residues from the N-terminus and C-terminus respectively is sufficient to unknot the Carbonic Anhydrase VII, the uncharacterized ORF becomes unknotted after an even deletion of 5 residues. However, this is sufficient to exclude an artifactual nature of these knots.

thumbnail
Figure 5. The two newly identified right-handed trefoil knots in recently deposited protein structures.

(A) On the top, the secondary structure and the accessible surface area (in transparency) of the human Carbonic Anhydrase VII, isoform 1 (3 mdz) is shown. On the bottom, a sausage view cartoon of the same enzyme is shown. In this representation, the diameter of the sausage is proportional to the B-factor. The thicker the backbone is, the more flexible it is. (B) The same representations as in (A) are shown for the knotted core of the uncharacterized ORF from Sulfolobus Islandicus rudivirus 1 (2x4i), chain A. Colors change continuously from blue (first residue) to red (last residue). The last residue of the 2x4i protein is colored in orange, since the structure presents a gap toward its true C-terminus end and results a slip-knot when the whole structure is considered, as detailed in the text.

https://doi.org/10.1371/journal.pone.0018693.g005

For what concerns recently reported trefoil knots, our results confirm the presence of a right-handed trefoil knot in the alpha subunit of human S-adenosylmethionine synthetase 2 (2p02) and the artifactual origin of the one detected in the ribosomal 80S-eEF2-sordarin complex of Saccharomyces cerevisiae (1s1h) first reported in [24].

Interestingly, we detected three left-handed trefoil knots respectively in the U2 snRNP Rds3p protein of S. Cerevisiae (2k0a), VirC2 protein of Agrobacterium tumefaciens (2rh3) and in the uncharacterized protein MJ0366 from Methanocaldococcus jannaschii (2efv). A fourth knot detected in the human prothrombin complexed with a peptidomimetic inhibitor (1jwt) was discarded due to a long structural gap. The left-handed trefoil knot in the Rds3p protein, which highlight a knotted zinc-finger motif, is the deepest knot of this kind reported to date [32]. Indeed, its knotted core is preserved after trimming of 19 and 18 residues from the C-terminus and the N-terminus respectively. Since this protein does not resemble protein belonging to the class, it shifts the left-handed to right-handed balance to 4 to 5, thus enforcing the non preferential handedness hypothesis.

Analysis of the MSR algorithm

As a secondary goal, we were interested in the characterization of an intrinsic feature of the MSR algorithm, the move lengths. Remarkably, differently from other proposed reduction schemes, here the move length is not constrained a priori to one (this can be easily seen in the animated reduction provided as Video S1). This characteristic leads to a particularly interesting class of curves which we call reduction curves, representing the time series of residual points during the reduction process. For example, Figure 6 illustrates the reduction of the above mentioned U2 snRNP Rds3p, the relevant reduction curve and move lengths.

To analyze these two features, 19316 protein structures were randomly extracted from the PDB, further selecting only those proteins of length comprised between the first (37 points) and the ninth deciles (357) of protein lengths (15529 structures). Proteins were processed with MSR and the number of residual points was associated to the corresponding move length at each reduction step.

thumbnail
Figure 6. MSR reduction curve of the U2 snRNP protein Rds3p.

On the middle are illustrated the 13 reduction steps (b-n) for the Rds3p protein (2k0a) (a). The last frame (n) represents the minimal structure of the protein, a left-handed trefoil knot. On the top, the residual points are plotted for each frame a-n. The corresponding move lengths are shown on the right.

https://doi.org/10.1371/journal.pone.0018693.g006

We first analyzed moves distribution. The observed distribution of move lengths is shown in Figure 7A, showing that quite long moves are rather frequent. In particular, move lengths quartiles are 0,4,13, the mean is 8.61 and 27% of the moves have length 0.

We then tested if move length depends on protein length. Proteins were sorted by length and the relevant move lengths were grouped in 100 equal sized bins, so that for instance the first bin contains moves corresponding to shortest proteins. As shown in Figure 7B, the mean of each bin significantly decreases (Mann-Kendall trend test, ) as a function of the protein length. An effect of final moves has been excluded by considering only the first 90% of the reduction process.

thumbnail
Figure 7. MSR algorithm analysis.

(A) The observed distribution of move lengths, considering only density values greater than 0.2%. (B) The mean move length significantly decreases as a function of the protein length. Values for each length percentile are reported. (C) Move length distributions are shown relatively to the first and fourth quartile of the reduction process, considering only density values greater than 0.2%. (D) Frequencies of classes of move lengths as a function of protein residual length at which they occur are dotted. LOESS curves are reported. Classes cutoffs were chosen according to move length quartiles (0,4,13) and the last 5% of residual lengths were discarded to remove frequency fluctuations.

https://doi.org/10.1371/journal.pone.0018693.g007

To assess if move length distribution changes during structure reduction, we compared the move distributions of the first and fourth quartile of the reduction process. To avoid overlaps, we considered reduction sequences of length at least 4 (14346 sequences). A significant difference between the two quartiles emerged (Wilcoxon test, ), as highlighted in Figure 7C. Moves with length up to 6 (short moves) are more frequent toward the end of the reduction process, while long moves occur preferentially in the first reduction quartile. This behavior is also confirmed by comparing the first and second half of the reduction process. However, shorter final moves are in principle explained by an increase of the edges mean length, as can be seen in Figure 6.

Finally, an interesting effect emerges when the frequencies of move lengths were analyzed as a function of the residual protein lengths at which they occur. By grouping move lengths in quartiles, while moves below the median reach the minimum frequency for a residual length around 60, the opposite behavior is attained by moves above the median (Figure 7D). Interestingly, a residual length around 60 is the optimum of the reduction process, where the frequency of 0 moves reaches its minimum and contextually the frequency of long moves is maximum.

Running time and complexity

The computation of the HOMFLY polynomial is known to be NP-hard [9], [33] and its running time exponentially increases with the number of crossings in the projection. However, the application of the MSR algorithm before the polynomial computation dramatically reduces the number of crossings, leading to a feasible computation of the HOMFLY polynomial for any structure analyzed in the present work. Indeed, the MSR algorithm has complexity in the number of points (i.e. the number of residues for a protein) and represents the dominant term in the total computational time for the vast majority of the analyzed structures, often independently from their knotted nature.

In practice, running times are reasonable for any analyzed PDB entry on a 2.4 GHz Intel Core 2 Duo processor with 2 Gb of RAM. On average, proteins of length 100, 200 and 300 take respectively 2, 10 and 20 seconds to be processed. The identification of the left-handed trefoil knot in the Rds3p (2k0a) requires 2.8 seconds (2.5 seconds for the MSR algorithm + 0.3 seconds for the polynomial computation), whereas the processing of the Stevedore's knotted protein (3bjx) takes 23.5 seconds (20 seconds + 3.5 seconds).

Implementation

All code for this work was written in Wolfram Mathematica 7 and executed on a Mac OSX platform. We developed the Mathematica package HPKnots.m based on the code provided as Text S3. HPKnots.m can be obtained upon request. The validation code also required KnotTheory.m, a third-party Mathematica package (http://katlas.org).

Conclusions

We have presented a novel topological framework for the HOMFLY polynomial computation of polygonal paths based on the geometric construction of Conway skein triples. Validation on tabulated knots and links demonstrates the global method robustness and the effectiveness of the greedy selection of the crossing to be switched. These evidences have been further confirmed by the polynomial computation of protein structures, also leading to an up-to date table of knotted structures. Whereas the performed topological checks allowed to discard artificially entangled proteins, two new right-handed trefoil knots have been detected.

Remarkably, the application range of the presented framework is not limited to proteins and it can be extended to the topological analysis of biological and synthetic polymers. Particularly, the study of knotted synthetic polymers like polyethylene has led to insights into the mechanical properties of such structures. The presence of a knot strongly weakens the polymer that potentially breaks at the entrance to the knot. Furthermore, knots frequency depends on the solvent and is higher in the coil phase than the globular phase with the knotted core size that increases as a function of the number of monomers. These aspects have been previously addressed with the computation of the Alexander polynomial in numerical simulations based on a simplified model of polyethylene [17]. Our framework can be successfully applied to this model and possible refinements, contributing to extend the knots spectrum so far considered and providing information about the knots chirality. Another suitable field of application of our method, in which generally more complex knots are investigated, is the topological study of cyclized DNA [5][7].

Finally, the applicability of the presented method is not confined to single component structures and can be applied to the topological study of multicomponent polygonal paths, providing a robust identification of knots or links when the frequency of entangled structures has to be addressed.

Supporting Information

Text S1.

Methods supporting information. This supplementary file details the computation of the intersection matrix and provides additional information on methods validation on tabulated knots and links and their application to protein structures.

https://doi.org/10.1371/journal.pone.0018693.s001

(PDF)

Text S2.

Generalized Reidemeister Moves. This supplementary file provides an illustrated description of a Generalized Reidemeister Move.

https://doi.org/10.1371/journal.pone.0018693.s002

(PDF)

Text S3.

Mathematica code. Mathematica code for the computation of the HOMFLY polynomial of a polygonal link. An application example on the Rds3p protein (2k0a) is provided.

https://doi.org/10.1371/journal.pone.0018693.s003

(PDF)

Table S1.

Table of knotted PDB entries. This supplementary table provides PDB ID and part details for each database entry that revealed a knotted structure. Entries are conveniently grouped by knot type.

https://doi.org/10.1371/journal.pone.0018693.s004

(PDF)

Video S1.

Minimal Structure Reduction of the alpha subunit of human S-adenosylmethionine synthetase 2. This supplementary movie show the reduction process of the human enzyme S-adenosylmethionine synthetase 2 (2p02), revealing a right-handed trefoil knot.

https://doi.org/10.1371/journal.pone.0018693.s005

(MOV)

Acknowledgments

We are grateful to Prof. Giovanni Gaudino for his support. We also thank Prof. Giovanni Battista Giovenzana and Dr. Franca Rossi for a lot of fruitful discussions and insights. Finally, we thank the reviewers for their helpful suggestions on the manuscript.

Author Contributions

Conceived and designed the experiments: FC MR. Performed the experiments: FC MR. Analyzed the data: FC MR. Wrote the paper: FC MR.

References

  1. 1. Lander ES, Waterman MS, editors. (1995) Calculating the Secrets of Life. Washington D.C.: National Academy Press.
  2. 2. Taylor WR (2007) Protein knots and fold complexity: Some new twists. Comput Biol Chem 31: 151–162.
  3. 3. Liu LF (1976) Knotted single-stranded DNA rings: A novel topological isomer of circular singlestranded DNA formed by treatment with Escherichia coli omega protein. J Mol Biol 106: 439–452.
  4. 4. Wasserman SA, Dungan JM, Cozzarelli NR (1985) Discovery of a predicted DNA knot substantiates a model for site-specific recombination. Science 229: 171–174.
  5. 5. Shaw SY (1993) Knotting of a DNA chain during ring closure. Science 260: 533–536.
  6. 6. Arsuaga J, Vázquez M, Trigueros S, Sumners D, Roca J (2002) Knotted probability of DNA molecules confined in restricted volumes: DNA knotting in phage capsids. Proc Natl Acad Sci U S A 99: 5373–5377.
  7. 7. Marenduzzo D, Orlandini E, Stasiak A, Sumners DW, Tubiana L, et al. (2009) DNA–DNA interactions in bacteriophage capsids are responsible for the observed DNA knotting. Proc Natl Acad Sci U S A 106: 22269–22274.
  8. 8. Mansfield ML (1994) Are there knots in proteins? Nat Struct Biol 1: 213–214.
  9. 9. Chen S, Dill KA (1996) Symmetries in proteins: A knot theory approach. J Chem Phys 104: 5964–5973.
  10. 10. Emmert-Streib F (2006) Algorithmic computation of knot polynomials of secondary structure elements of proteins. J Comp Biol 13: 1503–1512.
  11. 11. Erdmann MA (2005) Protein similarity from knot theory: Geometric convolution and line weavings. J Comp Biol 12: 609–637.
  12. 12. Taylor WR (2000) A deeply knotted protein structure and how it might fold. Nature 406: 916–919.
  13. 13. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. (2000) The protein data bank. Nucleic Acids Res 28: 235–242.
  14. 14. Virnau P, Mirny LA, Kardar M (2006) Intricate knots in proteins: Function and evolution. PLoS Comput Biol 2: e122.
  15. 15. Kolesov G, Virnau P, Kardar M, Mirny LA (2007) Protein knot server: detection of knots in protein structures. Nucleic Acid Res 35: W425–W428.
  16. 16. Lai YL, Yen SC, Yu SH, Hwang JK (2007) pKNOT: the protein KNOT web server. Nucleic Acids Res 35: W420–W424.
  17. 17. Virnau P, Kantor Y, Kardar M (2005) Knots in globule and coil phases of a model polyethylene. J Am Chem Soc 127: 15102–15106.
  18. 18. Koniaris K, Muthukumar M (1991) Knottedness in ring polymers. Phys Rev Lett 66: 2211–2214.
  19. 19. Taylor WR, Aszódi A (2005) Protein Geometry, Classification, Topology and Symmetry. New York: Taylor & Francis.
  20. 20. Reidemeister K (1926) Elementare begründung der knotentheorie. Abh Math Sem Univ Hamburg 5: 24–32.
  21. 21. Alexander JW, Briggs GB (1926) On types of knotted curves. Ann of Math 28: 562–586.
  22. 22. Liang C, Cerf C, Mislow K (1996) Specification of chirality for links and knots. J Math Chem 19: 241–263.
  23. 23. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247: 536–540.
  24. 24. Potestio R, Micheletti C, Orland H (2010) Knotted vs. unknotted proteins: Evidence of knotpromoting loops. PLoS Comput Biol 6: e1000864.
  25. 25. Conway JH (1970) An enumeration of knots and links, and some of their algebraic properties. In: Leech J, editor. Computational Problems in Abstract Algebra. pp. 329–358.
  26. 26. Freyd P, Yetter D, Hoste J, Lickorish WBR, Millett K, et al. (1985) A new polynomial invariant of knots and links. Bull Amer Math Soc (NS) 12: 239–246.
  27. 27. King NP, Yeates EO, Yeates TO (2007) Identification of rare slipknots in proteins and their implications for stability and folding. J Mol Biol 373: 153–166.
  28. 28. Vassiliev VA (1990) Cohomology of knot spaces. In: Arnold VI, editor. Theory of singularities and its applications. (Advances in Sovieth Mathematics, Vol. 1). 23 p.
  29. 29. Røgen P, Fain B (2003) Automatic classification of protein structure by using Gauss integrals. Proc Natl Acad Sci U S A 100: 119–124.
  30. 30. Lua RC, Grosberg AY (2006) Statistics of knots, geometry of conformations, and evolution of proteins. PLoS Comput Biol 2: e45.
  31. 31. Livingston C (1993) Knot Theory. Washington D.C.: The Mathematical Association of America.
  32. 32. van Roon AM, Loening NM, Obayashi E, Yang JC, Newman AJ, et al. (2008) Solution structure of the u2 snRNP protein rds3p reveals a knotted zinc-finger motif. Proc Natl Acad Sci U S A 105: 9621–9626.
  33. 33. Jaeger F (1988) Tutte polynomials and link polynomials. Proc Am Math Soc 103: 647–654.