Conceived and designed the experiments: HZ IS FD TL. Performed the experiments: HZ. Analyzed the data: HZ IS FD TL. Wrote the paper: HZ IS FD TL.
The authors have declared that no competing interests exist.
The study and comparison of protein-protein interfaces is essential for the understanding of the mechanisms of interaction between proteins. While there are many methods for comparing protein structures and protein binding sites, so far no methods have been reported for comparing the geometry of non-covalent interactions occurring at protein-protein interfaces.
Here we present a method for aligning non-covalent interactions between different protein-protein interfaces. The method aligns the vector representations of van der Waals interactions and hydrogen bonds based on their geometry. The method has been applied to a dataset which comprises a variety of protein-protein interfaces. The alignments are consistent to a large extent with the results obtained using two other complementary approaches. In addition, we apply the method to three examples of protein mimicry. The method successfully aligns respective interfaces and allows for recognizing conserved interface regions.
The Galinter method has been validated in the comparison of interfaces in which homologous subunits are involved, including cases of mimicry. The method is also applicable to comparing interfaces involving non-peptidic compounds. Galinter assists users in identifying local interface regions with similar patterns of non-covalent interactions. This is particularly relevant to the investigation of the molecular basis of interaction mimicry.
Protein-protein interactions are involved in most cellular processes as many proteins carry out their functions by forming complexes. These protein complexes consist of interacting polypeptide chains (subunits). The interfaces in such complexes are composed of complementary binding sites from the respective subunits.
The characterization of protein interfaces provides insights into protein interaction mechanisms. Such analysis is expected to have an impact on the prediction of interaction partners, as well as to assist in the design and engineering of protein interactions and interaction inhibitors. The physico-chemical properties of protein-protein interfaces have been previously investigated
Detailed comparison of protein-protein interfaces is fundamental for their better characterization and for structure-based classification of protein complexes. With an increasing amount of structural models for protein complexes available in the Protein Data Bank (PDB)
In a comprehensive study, Aloy
Local structure comparison of interfaces has been the focus of several other studies. Nussinov and colleagues have clustered all known protein-protein interfaces in the PDB by comparing the binding site Cα atoms using a geometric hashing procedure
Protein complexes are stabilized by non-covalent interactions formed across interfaces (when we speak of non-covalent interaction we mean interactions between specific functional groups; when we speak of interaction, in general, we mean interactions between whole proteins composed of many non-covalent interactions). Non-covalent interactions at protein-protein or protein-ligand interfaces are often compared in order to characterize binding modes and to identify detailed structural differences. Biswal and colleagues have manually examined van der Waals (vdW) interactions and hydrogen bonds at two interfaces corresponding to a polymerase binding to two different inhibitors
Here, we present a novel method, Galinter, for aligning protein-protein interfaces. To our knowledge, this is the first method for explicitly comparing the geometry of non-covalent interactions at interfaces. The explicit comparison of non-covalent interactions provides an intuitive method of comparative analysis and visualization of binding modes, and for investigating the degree of conservation between interfaces. We have tested Galinter on a published dataset of interfaces, and have also applied the method to analyzing three medically relevant cases of protein mimicry.
In this study, two types of non-covalent interactions are considered: van der Waals interactions and hydrogen bonds. These non-covalent interactions are represented as vectors (NCIVs) connecting the centers of two interacting atoms. The goal of the method is to find the largest set of NCIVs that can be superposed (structurally aligned) in similar geometric orientations. Two NCIVs (each from one interface) are matched in the alignment if they represent the same type of non-covalent interactions, and have similar distances and relative orientations to the other matched NCIVs within the respective interfaces. A graph-based method is applied for aligning NCIVs. The complete procedure is implemented in Galinter (
(NCIV: non-covalent interaction vector; CVec: contact vector)
For two protein complexes with known structures, two types of NCIVs between the interacting proteins are distinguished. Contact vectors (CVecs) are detected based on a distance criterion and represent van der Waals interactions. A CVec connects two heavy atoms if the distance between them is less than the sum of their respective van der Waals radii plus 1.0 Å. The user specifies one of the two binding sites as the head site and the other as the tail site. All CVecs point from the tail to the head site. Hydrogen bond vectors (HVecs) are the second type of NCIV. These are determined by adding hydrogen atoms to the protein structures with the REDUCE program
In this step, two CVecs are grouped into the same cluster if they are closer than 2.0 Å and if the angle between their orientations is at most 45°. Subsequently, a consensus vector is computed and then used as representative for each cluster. A complete linkage hierarchical clustering algorithm is employed to cluster the NCIVs. HVecs are not clustered and are directly taken as representatives. The distance between representatives is defined in the same way as the distance between NCIVs.
This clustering step is based on the observation that often there are small groups (size of 2–4) of CVecs with similar orientations (angle difference at most 45°). Clustering NCIVs also reduces the size of the alignment problem and enables Galinter to obtain results in reasonable run time (within minutes).
In this step, each protein-protein interface is modeled as an undirected node-labeled edge-labeled graph
Given two graphs
The maximum common subgraph problem is transformed to the maximum clique problem in the traditional fashion
After obtaining the product graph, maximal cliques are detected
Up to this stage, the alignment consists of aligned representatives of NCIV clusters. In this step these aligned representatives are used as “anchors” for deriving the alignment between the original sets of NCIVs.
First, in an
After finding all the potential alignments of NCIVs, a
The resulting matched NCIVs replace the aligned representatives as new anchors, and the
The source code of Galinter is available upon request from the authors.
We have applied Galinter to the pilot dataset which was used for testing I2I-SiteEngine
For any pair of complexes to be compared, if at least one subunit of one complex is homologous to at least one subunit of the other complex, then the two complexes are labeled as S/D-homologous (single- or double-sided homologous). Otherwise the two complexes are labeled as non-homologous. Two subunit structures are considered to be homologous if they belong to the same superfamily in SCOP
On the pilot dataset, Galinter alignments were compared to the alignments generated by the I2I-SiteEngine interface comparison method. I2I-SiteEngine matches chemical functional groups and associated residues at the binding sites of different interfaces. In addition, we compared the results of both Galinter and I2I-SiteEngine to alignments based on backbone structure, generated with DaliLite
In this work, we define interface residues as those which contain at least one interface atom, where interface atoms are the atoms involved in interface NCIVs. We compared the alignment of interfaces from the different methods (Galinter, I2I-SiteEngine, and DaliLite) by examining the deviation of Cα atom coordinates of interface residues after corresponding transformations. Given two interface residue sets
To assess whether Galinter produces valid interface alignments, we compared the results of Galinter to the alignments generated by other approaches. One of these approaches aligns functional chemical groups at interfaces (I2I-SiteEngine) and the other approach aligns backbone structures (DaliLite).
In the second part of this section, we present the application of Galinter to three mimicry cases, for which the interfaces have been manually compared before.
We have applied Galinter to every pair of interfaces within each of the 14 groups from the pilot dataset. There are 240 comparisons in total. The mean run time is 138.5 seconds (median run time 71.5 seconds) on a normal desktop (3.0 GHz CPU, 1GB memory) for these comparisons. The alignment results are compared to those of I2I-SiteEngine and DaliLite. The extent of agreement is measured using irRMSD values as described in section “
I2I-SiteEngine compares interfaces by aligning the functional groups at binding sites, instead of aligning molecular interactions within the interface like Galinter. Galinter and I2I-SiteEngine can be regarded as complementary approaches as they use different properties to compare interfaces.
Backbone structure comparison methods like DaliLite can be used to generate interface alignments indirectly. These alignments are indirect in the sense that they do not take the structural similarities of the interfaces into account explicitly. When the interaction orientations of subunits are conserved between S/D-homologous complexes, these indirect alignments provide a coarse way of validating alignments from direct methods like Galinter and I2I-SiteEngine. The alignments based on backbone structures are expected to agree with explicit alignments of non-covalent interactions within the interfaces to some extent but not necessarily to match them.
Most interfaces for non-homologous complexes cannot be compared using backbone alignment method. Thus for the alignments of non-homologous complex interfaces, only an overview of irRMSD values for the comparison between Galinter and I2I-SiteEngine are shown. (Hm: S/D-homologous; NonHm: non-homologous; Gal: Galinter; I2I: I2I-SiteEngine; Dal: DaliLite)
We have explored possible causes for the disagreements between the alignments of different methods. For non-homologous interfaces, most of the disagreements are observed in groups 19 and 5. Group 19 consists of coiled-coil interfaces. More than a single solution is expected for the alignment of these repetitive structures. Therefore it is not surprising that the alignments from different methods disagree. In general, the alignments of both methods result in reasonable superimposition of the helix backbones. Nevertheless, visual inspection reveals that for some of these pairs one of the methods generates better superposition of the interacting helices. Galinter produces better superposition in five pairs (1ic2CD
In group 5, there are relatively few similarities between the subunits from different complexes. There seems to be no obvious alignment solution in terms of either structure or evolution. The only evident common feature in these interfaces is that they include two interacting β-strands. The assessment of the results in this group is thus challenging. Bearing this in mind, we have investigated the quality of the results by visual inspection of the superposition of the two strands at the interfaces. We have found that for 15 pairs Galinter provides better superposition of the interface β-strands, and for five pairs I2I-SiteEngines leads to better superposition of these strands.
The disagreements between Galinter and I2I-SiteEngine for S/D-homologous interfaces arise mainly from group 10, and also to a lesser extent, from the smaller group 4. Interestingly, for these two groups, the Galinter alignments agree with those based on DaliLite.
In general, the three methods agree to a large extent, especially when the interfaces are related by homology. Nevertheless, it is not surprising to observe disagreements in the non-homologous groups, considering both that Galinter and I2I-SiteEngine are based on different interface properties and that there are no unique solutions in these groups.
The current implementation of Galinter aligns vdW interactions and hydrogen bonds at interfaces. However, there are other types of non-covalent atomic interactions, especially electrostatic interactions between positively and negatively charged atoms. Thus we have explored the contribution of short-range electrostatic interactions to the alignment of protein-protein interfaces. Using a definition by Xu
These results indicate that the current method seems to be robust with respect to different weighting of the various types of interactions. Nevertheless, a thorough investigation is required on how to weight different types of non-covalent interactions for interface alignment, which will be the focus of future work.
Protein mimicry is relevant in the design of protein inhibitors. These inhibitors are frequently designed such that their binding mode is similar to that of a wild-type protein-protein interaction. Their development process is expected to benefit from detailed comparisons of the non-covalent interactions. We have applied Galinter to studying the protein-protein interaction mechanisms of three cases of protein mimicry: i) Chymotrypsin and subtilisin interact with the same type of inhibitors, an example of convergent evolution
In each of these three cases, the subunits are homologous only on one side of the interface. In the third case, one of the interacting partners is not even a protein.
The Ser–His–Asp catalytic triad present in many proteases has been intensively analyzed
We have analyzed the interactions formed between chymotrypsin and leech proteinase inhibitor eglin c (PDB code: 1acb, chains E and I), and subtilisin with chymotrypsin inhibitor 2 (PDB code: 1lw6, chains E and I). The two protease inhibitors have similar backbone structures and belong to the same SCOP family (b.40.1.1). The two interfaces contain 299 and 332 NCIVs, respectively. The longest Galinter alignment consists of 117 aligned NCIVs, and the results are visualized in
Every example is shown with two representations in the same orientation. In all representations, the homologous side is in light blue and light yellow at the top, the mimic side is shown in dark blue and orange at the bottom. NCIVs at interfaces are shown as thin lines. A) Superposed inhibitors and catalytic triads for chymotrypsin (1acb) and subtilisin (1lw6) according to the Galinter alignment. The inhibitor for Chymotrypsin is shown in light blue and the inhibitor for subtilisin is shown in light yellow. The catalytic triads of chymotrypsin and subtilisin are shown as sticks in dark blue and orange, respectively. The chymotrypsin binding site is shown as a gray surface. B) Superposed NCIVs for chymotrypsin/inhibitor interface (1acbEI) and subtilisin/inhibitor interface (1lw6EI) according to the Galinter alignment. Only matched NCIVs are shown. Chymotrypsin/inhibitor NCIVs are shown in cyan, and subtilisin/inhibitor NCIVs are shown in yellow. C) Superposed NCIVs for CD4/gp120 interface (1rzjCG) and CD4M33-F23/gp120 interface (1yymMG) according to the Galinter alignment. CD4 is shown in dark blue and CD4M33-F23 is in orange. Only matched NCIVs are shown. CD4/gp120 NCIVs are shown in cyan, and CD4M33-F23/gp120 NCIVs are in yellow. Hydrogen bonds are shown as thick lines. D) An enlarged view of the matched NCIVs involving the hot spot phenylalanines. E) Superposed NCIVs according to the Galinter alignment of IL-2Rα/IL-2 interface (1z92BA) in dark and light blue, and of SP4206/IL-2 interface (1py2_A) in orange and light yellow. Only matched NCIVs are shown. IL-2Rα/IL-2 NCIVs are shown in cyan, SP4206/IL-2 NCIVs are in yellow. The hot spot residues Phe42, Tyr45, and Glu62 in IL-2 are shown as sticks. F) An enlarged view of the
We have also compared the two interfaces based on inhibitor backbone alignment. First the inhibitor structures of the two complexes have been aligned using DaliLite. Then the two proteases have been superposed accordingly. This way an alignment of the interfaces is obtained indirectly. This indirect alignment agrees with the Galinter alignment to a considerable extent (irRMSD = 2.7 Å). Based on this indirect alignment, the RMSD for the overall functional template atoms of the catalytic triads is much larger than the one obtained based on the Galinter alignment (2.2 Å
In order for HIV to infect host cells, the HIV envelope glycoprotein gp120 binds CD4 receptors located on the target cell surfaces. The CD4 binding site for gp120 has been engineered onto a scorpion-toxin protein, resulting in CD4M33-F23. Recently, the mimicked interaction of CD4M33-F23 in complex with gp120 has been investigated in detail and compared to the native complex structure of CD4 and gp120
We have compared the natural complex interfaces (PDB code: 1rzj, chains C and G) and mimicry interface (PDB code: 1yym, chains M and G) using Galinter. The numbers of NCIVs are 364 for 1rzjCG and 166 for 1yymMG. In spite of the lack of similarities between the overall folds of CD4 and CD4M33-F23, about 80% (133 NCIVs) of the NCIVs at the CD4M33-F23/gp120 interface have been aligned to those at the CD4/gp120 interface. In addition, three of the four interface hydrogen bonds aligned as described in Huang
We have also observed that the hot spot residue Phe43 in CD4 (or equivalent residue Phe23 in CD4M33-F23) is in contact with eight residues of gp120 (Asp368, Glu370, Ile371, Asn425, Met426, Trp427, Gly473, and Met475) via 46 vdW interactions of 133 total aligned NCIVs in both interfaces. All these NCIVs have been aligned by Galinter successfully (
Thanos
We have compared the interface of IL-2Rα and IL-2 (PDB code: 1z92, chains B and A), with the interface formed between SP4206 and IL-2 (PDB code: 1py2, FRH and chain A) using Galinter. The protocol has been slightly modified in order to identify hydrogen bonds between a non-peptidic molecule and a protein. HBPLUS
We have applied I2I-SiteEngine to align the three pairs of mimicry interfaces. In the case of the two protease-inhibitor interfaces, I2I-SiteEngine generates a similar alignment to Galinter with an irRMSD of 1.0 Å. The RMSD for the overall functional template atoms of the two catalytic triads is worse than that calculated based on Galinter alignment (1.1 Å
We have presented Galinter, a novel method for explicitly comparing interfaces based on the geometry and type of non-covalent interactions. The proposed method complements existing approaches to the analysis of protein-protein interfaces. The method was applied to the pilot dataset
Currently, the final Galinter alignments of NCIVs are ranked by their size in terms of the number of involved NCIVs, but a more comprehensive scoring function for alignments is desirable. Geometric and chemical similarity of matched NCIVs should be taken into account when computing alignment scores. Ideally such a scoring function should provide a statistical significance value for each alignment as well. This will be the focus of future work.
We have demonstrated the application of Galinter to the comparison of protein-protein interfaces, and also to the comparison of a protein-protein interface with an interface between a protein and a non-peptidic molecule (ligand). Galinter may also be applied for comparing protein-ligand to protein-ligand interfaces. But for this purpose the approach needs to be further tested. In addition, the interfaces in the current work have been defined between different polypeptide chains. However, the method is also applicable to the comparison of interfaces formed between protein domains along the same chain.
In the comparison of SP4206/IL-2 and IL-2Rα/IL-2, we have observed a non-uniform distribution of conserved NCIVs throughout the two interfaces. The NCIVs involving residue Arg36 on IL-2Rα and its counterpart guanido group on SP4206 are highly conserved. Similar results have also been observed in the first and second case studies. In the case of the protease/inhibitor interfaces, a large fraction of aligned NCIVs involve the two catalytic residues serine and histidine. At CD4/gp120 and CD4M33-F23/gp120 interfaces, Phe43 in CD4 and Phe23 in CD4M33-F23, respectively, form 46 NCIVs with eight surrounding residues (see
Geometric criteria for identifying hydrogen bonds.
(1.17 MB TIF)
Pilot dataset.
(0.60 MB TIF)
Galinter vs. I2I-SiteEngine. Heat maps for irRMSD values of interface residues. Only the 14 non-singleton groups in the pilot dataset are shown. The heat maps are sorted by size. The columns and rows for each heat map represent interfaces identified by their PDB code and chain names constituting the interfaces. The diagonal grids of all heat maps have been left blank. For S/D-homologous complexes, S/D-homology is indicated in corresponding grids by either a plus sign (+) for double-sided homology, or a minus sign (−) for single-sided homology. The heat maps have been produced using R (
(0.78 MB TIF)
Galinter vs. DaliLite. Heat maps for irRMSD values of interface residues. Only the 14 non-singleton groups in the pilot dataset are shown. The heat maps are sorted by size. The columns and rows for each heat map represent interfaces identified by their PDB code and chain names constituting the interfaces. The diagonal grids of all heat maps have been left blank. For S/D-homologous complexes, S/D-homology is indicated in corresponding grids by either a plus sign (+) for double-sided homology, or a minus sign (−) for single-sided homology. The heat maps have been produced using R (
(0.77 MB TIF)
I2I-SiteEngine vs. DaliLite. Heat maps for irRMSD values of interface residues. Only the 14 non-singleton groups in the pilot dataset are shown. The heat maps are sorted by size. The columns and rows for each heat map represent interfaces identified by their PDB code and chain names constituting the interfaces. The diagonal grids of all heat maps have been left blank. For S/D-homologous complexes, S/D-homology is indicated in corresponding grids by either a plus sign (+) for double-sided homology, or a minus sign (−) for single-sided homology. The heat maps have been produced using R (
(0.77 MB TIF)
Alignment of interfaces based on backbone structure. Using DaliLite, subunit structures are compared individually at both sides of interfaces. A subsequent alignment of interface residues can be derived based on the most significant DaliLite alignment of subunit structures.
(1.38 MB TIF)
Comparison of interface alignments using irRMSD measure. Given two interface residue sets
(1.30 MB TIF)
We are grateful to Oliver Sander and Andreas Steffen for fruitful discussions. We thank Gabriele Mayr and Mario Albrecht for helpful comments on the manuscript.