Advertisement
Research Article

Community-Based Network Study of Protein-Carbohydrate Interactions in Plant Lectins Using Glycan Array Data

  • Adeel Malik,

    Affiliation: Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea

    X
  • Juyong Lee,

    Affiliation: Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea

    X
  • Jooyoung Lee mail

    jlee@kias.re.kr

    Affiliation: Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea

    X
  • Published: April 22, 2014
  • DOI: 10.1371/journal.pone.0095480

Abstract

Lectins play major roles in biological processes such as immune recognition and regulation, inflammatory responses, cytokine signaling, and cell adhesion. Recently, glycan microarrays have shown to play key roles in understanding glycobiology, allowing us to study the relationship between the specificities of glycan binding proteins and their natural ligands at the omics scale. However, one of the drawbacks in utilizing glycan microarray data is the lack of systematic analysis tools to extract information. In this work, we attempt to group various lectins and their interacting carbohydrates by using community-based analysis of a lectin-carbohydrate network. The network consists of 1119 nodes and 16769 edges and we have identified 3 lectins having large degrees of connectivity playing the roles of hubs. The community based network analysis provides an easy way to obtain a general picture of the lectin-glycan interaction and many statistically significant functional groups.

Introduction

Glycans play important roles inside eukaryotic cells by binding to proteins and lipids, and they are also found in the extracellular space between cells [1]. Glycans can be grouped into two classes; linear sugars and polysaccharides. The polysaccharides consist of repeating pyranose monosaccharide rings and branched sugars, which are formed by linking various monosaccharide units [2]. Through non-covalent interactions with lectins, glycans control biochemical reactions by engaging in various biological processes such as development [3], [4], coagulation [5] and response to infection by bacterial and viral agents [6]. The size of the cellular glycome is believed to be in range of 100000–500000 glycans [7]. This large size of glycomic contents could be attributed to the combinatorial aspect that oligosaccharide chains come in either linear or branched form, monosaccharide building blocks are either in α or in β anomeric configurations and monosaccharides can be linked via various carbon atoms in their sugar rings [8]. Using the complexity of the glycome, cells adopt to encode a massive amount of biological information, and it is a great challenge to decode this hidden information to understand the biology of lectins and their interactions with carbohydrates.

Protein-carbohydrate interactions are involved in a variety of biological and biochemical processes, and, recently, attempts to understand the molecular basis of such interactions have appeared [9]. Traditional methods to probe glycan–protein recognition events include X-ray crystallography, NMR spectroscopy, the hemagglutination inhibition assay [10], enzyme-linked lectin assay [11], surface plasmon resonance [12] and isothermal titration calorimetry [13]. Although these methods have been successfully applied to elucidate the details of carbohydrate–protein interactions, they are rather labor intensive and require large amounts of carbohydrate samples. These shortcomings make the aforementioned traditional approaches unsuitable as high-throughput analytic methods [14]. On the other hand, recently, many computational methods have been suggested to study protein carbohydrate interactions [15][21].

Conventional methods for carbohydrate ligand detection are often cumbersome and we need sensitive and high-throughput technologies that can analyze carbohydrate-protein interactions in order to discover and differentiate oligosaccharide sequences interacting with carbohydrate binding proteins [8]. Carbohydrate micro-array based technology can serve as an appropriate method [22][25]. However, at present, one of the biggest limiting factors in utilizing the complete potential of the glycan microarray data is the lack of efficient analysis tools to extract relevant information.

For complete utilization of a glycan microarray data, we need a systematic computational method [26]. Large quantities of data are generated from the analysis of the Consortium for Functional Glycomics (CFG) glycan microarray [27]. Also, predicting the glycan-binding specificity or binding motif can be a time consuming step of scrutinizing and evaluating the linear sequences of monosaccharides in glycans [27]. The CFG offers glycan microarray data for various lectins (both plant and animal origin) and glycan binding antibodies. Recently computational methods have been developed for analyzing the glycan-binding specificity from glycan array data such as the motif-segregation method [26] and the outlier motif analysis (OMA) method [28].

In this work, we have developed a method to group various plant lectins and their interacting carbohydrates by the community detection analysis of a lectin-glycan network generated by the glycan microarray data from CFG. The lectin-glycan network consists of 1119 nodes (lectins and glycans) and 16769 edges (interactions). From this network, we have identified 3 lectins having large degrees of connectivity playing the roles of hubs. Additionally, we compared the results of our community detection method with other well known clustering algorithms. We show that our method outperforms existing clustering methods in terms of both modularity score as well as the number of statistically significant (p-value ≤0.05) glycan specific lectin groups. We propose that this study can reveal a global organization of lectin-glycan interactions, and help to identify strongly correlated lectin and glycan clusters.

Methodology

Data Generation

A total of 786 glycan array files for plant lectins were downloaded using a custom made script from Consortium for Functional Glycomics (CFG) as of Dec 2013. CFG provides extensive glycomics resources so that one can explore functions of glycans and glycan-binding proteins that play important roles in human health and disease [http://www.functionalglycomics.org/stati​c/consortium/consortium.shtml]. All of these 786 files were further processed into a single input file, which consists of rows of protein-carbohydrate pairs. Three datasets were generated by filtering the protein-carbohydrate pairs using the cutoff values of relative fluorescence units (RFU) 5000, 10000 and 20000. These three datasets were used for network construction and their community detection. Figure 1 shows the histogram of the RFU values collected from 786 glycan array files. The data corresponding to RFU larger than 5000 constitutes only about 3.5% of the whole data. All the data is available to researchers upon request.

thumbnail

Figure 1. Histogram of the RFU values collected from 786 glycan array files is shown.

It should be noted that the y-axis is shown in the log scale and the data corresponding to RFU larger than 5000 constitutes only about 3.5% of the whole data.

doi:10.1371/journal.pone.0095480.g001

Network Construction

To perform a systematic analysis of protein-carbohydrate interaction, we have constructed a bipartite network, where unweighted edges are assigned between proteins and carbohydrates. Each node represents a lectin or a glycan and its identity is indicated by its array ID or glycan ID at a given condition. A glycan array ID represents a specific protein under a specific condition. Therefore, two different nodes in the network may represent two different concentrations of a protein in the glycan array experiment. The strength of a lectin-glycan interaction is represented by its RFU value and three networks are generated using three cutoff values of RFU of 5000, 10000 and 20000.

Community Detection of a Network

We have identified the community structure of the lectin-glycan network by using the Mod-CSA method, which is a highly effective modularity optimization method [29], [30], [31]. The modularity is a widely used measure to determine the community structures of various networks. From a given community structure it measures the difference between the number of inter-community edges and its expected value from a randomly re-wired counterpart preserving the degrees of nodes. Modularity (Q) is defined as:
where M is the total number of edges in the network, is the number of communities, is the number of edges within community i and is the sum of degrees of nodes in community i. The value of Q ranges between −1 and 1 and it becomes close to 1 for a highly modular community structure and 0 for a random community structure [32].

Network Visualization and Comparison with other Clustering Methods

Three lectin glycan array networks constructed in this study were exported to the Cytoscape 2.8.2, a bioinformatics package for biological network visualization and data integration [33]. To compare our clustering method with other widely used network clustering algorithms such as MCL [34], [35], MCODE [36] and greedy algorithm [32], we have used clusterMaker [37] and GLay plugins [38], a multi-algorithm clustering plugins for Cytoscape.

Enrichment of Glycan-specific Proteins

Enriched glycan-specific lectins within each cluster were investigated by annotating each lectin with a predetermined glycan binding specificity. Reported specificities of various lectins were extracted from literature [39], [40] and Uniprot database [41] as summarized in Table 1. The full list of all 513 protein nodes used in this study with annotations (wherever possible) are listed in Table S1.

thumbnail

Table 1. List of glycan binding specificities of lectins investigated in this study is shown. Specificities are collected from literature and uniprot database.

doi:10.1371/journal.pone.0095480.t001

The enrichment of glycan-specificities of lectins in each cluster was assessed by calculating the hypergeometric p-value. The p-value corresponds to the probability that a given lectin cluster sharing the same glycan-specificity can be obtained by chances. The p-value was calculated as follows:
where N is the total number of lectins in the network, K is the number of all lectins having a particular glycan-specificity, and k is the number of lectins having the particular glycan-specificity in a cluster with the size of n.

Enrichment analysis was also attempted by using DAVID functional annotation cluster tool [http://david.abcc.ncifcrf.gov/home.jsp], which did not yield any statistical significant clusters. We then manually searched each lectin in InterPro database [42] but only 8 unique GO terms such as chitin-binding, carbohydrate-binding, protein binding, endopeptidase inhibitor activity, etc, were retrieved. However, these GO terms are too general to signify any detailed glycan binding specificities of corresponding lectins. Therefore, in this study, the enrichment analysis for each cluster was performed based on the annotations listed in Table 1. Only those clusters with at least 10 protein nodes were analyzed for statistical significance.

Identification of Hub Proteins

In general, biological networks possess the scale-free property [43] in which only a few nodes in the network have many connections serving as hubs in the network. Hub proteins were identified by calculating the node degree distribution [44] by using the NetworkAnalyzer plugin of Cytoscape. Top three highest degree protein nodes were assigned as hubs (see Figure 2).

thumbnail

Figure 2. The node degree distribution of the lectin-glycan network is shown.

We observe a large gap between 3 hub nodes and the other nodes. The degree distribution was plotted using plotly [https://plot.ly/plot].

doi:10.1371/journal.pone.0095480.g002

Results and Discussion

We constructed three lectin-glycan interaction networks by using the plant lectin-glycan micro array data filtered by three RFU cut-offs. The network where the interactions were filtered by RFU <5000 consists of 1119 nodes (513 proteins and 606 carbohydrates) and 16769 edges. Similarly, the second network filtered by RFU <10000 has 1035 nodes and 12169 edges, and the third one (filtered by RFU <20000) consists of 901 nodes and 8042 edges. Since the first network has the maximum number of nodes and edges, and shows more statistically significant glycan specific groups (discussed later) than the other two networks, the results specified henceforth represent the first network if not specifically indicated. The first network is shown in Figure 3, where proteins are represented as diamonds and glycans as circles and the interactions between them are represented as edges.

thumbnail

Figure 3. The lectin-glycan network generated using the RFU cut-off of 5000 is shown.

Circles represent glycan nodes and diamonds represent lectin nodes. The nodes are color coded according to their communities. Three hub nodes (shown in green diamonds) are PP2A1, WGA1 and RCA.

doi:10.1371/journal.pone.0095480.g003

The network representation enables a quick visual inspection of the glycans bound to a lectin of interest. Additionally, in order to identify hub lectins from the lectin-glycan array, the node degree distribution of the network was calculated and is shown in Figure 2. In an interaction network, proteins that interact with a large number of partners are considered as hubs [45], and are essential components of biological networks [46]. The definition of the hub node is rather subjective, but based on the observation of the biggest gap between the 3rd and 4th largest degree nodes in Figure 2, we assigned hub proteins as those three with degree larger than 220. The 3 hubs are Phloem Protein2 (PP2A1) from Arabidopsis thaliana, wheat germ agglutinin (WGA) from Triticum vulgaris (wheat), and Ricinus communis agglutinin (RCA) from Ricinus communis (castor bean).

By using the Mod-CSA method, the lectin-glycan network is clustered into 4 modules (communities), which are represented by separate colors in Figure 3. The largest module consists of 168 protein nodes and 215 glycan nodes, and the smallest community contains 98 protein nodes and 133 glycan nodes.

To validate the lectin-glycan interaction network and its detected community-structure, we investigated the binding specificities of the first neighbors of two plant lectins, Sambucus nigra agglutinin (SNA) and concanavalin A (ConA) whose glycan binding specificities are well known. The first lectin is a well-characterized plant lectin, elderberry bark agglutinin from Sambucus nigra, which is known to recognize the Neu5Aca2-6Gal linkage [47]. The second one is concanavalin A (ConA), which is known to have specificity for mannose sugars [48], [49], [50]. Proper categorization of the specificities of glycan-binding proteins plays a significant role in understanding protein-glycan interactions and utilizing glycan-binding proteins as analytical reagents.

Binding Specificities of SNA

It is well known that some plants contain more than one lectin with different sugar binding specificities [51]. The bark of the elderberry (Sambucus nigra) has two lectins SNA-I and SNA-II with different glycan binding specificities. Sambucus nigra agglutinin I (SNA-I), is the first lectin identified from the elderberry bark which has been conventionally employed to recognize Neu5Acα2-6Gal [47] or Neu5Acα2-6Galβ1-4GlcNAc sequence [27]. SNA-I is composed of two polypeptides, namely chain A of 33 kDa with enzymatic activity, and chain B of 35 kDa with carbohydrate-binding activity [52]. Molecular modeling studies have indicated that the overall structure of SNA-I is quite similar to that of Ricin [53] and SNA-I belongs to the group of type 2 ribosome-inactivating proteins [52]. SNA-II is the second lectin isolated from the elderberry bark tissue, and it exhibits high affinity for glycoconjugates and Type 14 pneumococcal polysaccharides having multiple terminal D-Gal groups [51]. SNA-II consists of two identical carbohydrate-binding B-chains [51], [52].

In the current lectin glycan array network, nineteen nodes represent both SNA-I and SNA-II lectins. Out of these nineteen SNA nodes, fifteen SNA-I nodes are from community 1 (1000180, 1000181, 1000183, 1000184 and 1000725), and community 3 (1002793, 1004421, 1004422, 1004701, 1004702, 1004703, 1004704, 1004705, 1004706 and 1004780). Similarly, SNA-II is represented by four nodes (1004707, 1004708, 1004709 and 1004710) enriched in community 3.

The 10 SNA-I nodes in community 3 show specificity for complex-type biantennary N-glycans (Table 2A). From this table we observe that almost all of the interacting glycans possess the determinant Neu5Acα2-6Gal or Neu5Acα2-6Galβ1-4GlcNAc (shown by bold text in the table). Another interesting point to notice is that the glycans 527 and 479 exhibit low RFU values in Table 2. This could be due to the fact that these glycans contain Neu5Acα2-3 sequence, which is known to decrease the binding of SNA [27]. On the other hand, 316 (Neu5Acα2-3Galβ1-4GlcNAcβ1-2Manα1-3(Neu5​Acα2-6Galβ1-4GlcNAcβ1-2Manα1-6)Manβ1-4Gl​cNAcβ1-4GlcNAcβ-Sp12)contains two sequences, one (Neu5Acα2-6Galβ1-4GlcNAc) increasing the binding and the other (Neu5Acα2-3) decreasing the binding.

thumbnail

Table 2. Three types of complex glycans for SNA proteins are listed.

doi:10.1371/journal.pone.0095480.t002

Compared to SNA-I nodes in community 3, five SNA-I nodes in community 1 (1000180, 1000181, 1000183, 1000184 and 1000725) interact with a smaller number of complex glycans (see Table 2B). Top 3 glycans possess either Neu5Acα2-6Gal or Neu5Acα2-6Galβ1-4GlcNAc and show RFU values greater than 40000. Two glycans from the second half of the table (glycans 60 and 59) show lower values of RFU because of the presence of the Neu5Acα2-3Gal sequence, which is known to decrease glycan binding. All these results are consistent with existing studies on the SNA specificity [27].

The 4 SNA-II nodes (1004707, 1004708, 1004709 and 1004710) in community 3 show preference for mainly mannose glycans or terminal GlcNAcb1-4GlcNAcb. Only two glycans (347 and 349) possess the determinant of Neu5Acα2-6Galβ1-4GlcNAc (Table 2C). In general, SNA-II is known to be Gal/GalNAc specific and is precipitated by glycoproteins, which consist of terminal GalNAc oligosaccharide chains [51]. Specifically, it shows higher affinity for D-GalNAc- and terminal N-acetyl-D-galactosaminyl disaccharides as compared to D-Gal. Conversely, the affinity exhibited by SNA-I for D-Gal and D-GalNAc- is identical [51]. However, SNA-I recognizes Neu5Acα2-6Gal [47] or Neu5Acα2-6Galβ1-4GlcNAc glycan sequence [27] with high specificity. Despite the differences in their glycan binding specificities, SNA-I and SNA-II share some similarities. For example, both lectins contain similar amino acid composition, while SNA-II contains more asparagine/aspartic acid, glycine and methionine residues [51]. Additionally, the carbohydrate-binding B-chains of both lectins show caspase-dependent apoptosis in different insect cell lines [52]. Considering their characteristic glycan binding specificities, SNA-I and SNA-II may play different functional roles in plants.

Binding Specificities of ConA

Concanavalin A (ConA) binds to a variety of eukaryotic cells through specific interactions with saccharide-containing cellular receptors, and has been widely used as a molecular probe in studies of cell membrane dynamics and cell division [54]. ConA typically binds to glucosyl and mannosyl residues at the non-reducing termini of oligo- or polysaccharides [48], [49] and it can also bind to non-terminal mannosyl residues [50]. The current network contains sixteen nodes of ConA (1000158 and 1000165 in community 1; 1000356 and 1000699 in community 2; and 1004459, 1004460, 1004461, 1004462, 1004464, 1004465, 1004466, 1004467, 1004468, 1002791, 1004412 and 1004413 in community 3) which mainly interacts with mannose containing glycans.

All ConA interacting glycan nodes from community 1, 2 and 3 are shown in Table 3A, 3B and 3C, respectively. ConA interacting glycan nodes in community 1 are either mannose sugars or biantennary complex glycans such as transferrin and AGP-B. On the other hand, the ConA nodes in community 2 show preference for terminal glucose glycans.

thumbnail

Table 3. The table shows all types of glycans interacting with ConA protein nodes.

doi:10.1371/journal.pone.0095480.t003

In comparison to communities 1 and 2, the ConA nodes in community 3 show high preference for mannose containing sugars especially “N-glycan, high mannose” (Table 3C). These results agree with existing reports on ConA’s binding structure and specificity for mannose containing structures [55]-[57], in addition to the recognition of biantennary glycans, complex N-glycans [58] and terminal glucose [57].

Existing studies on SNA-I [47] and ConA [55]-[57] demonstrate the validity of the lectin-glycan interaction network and its detected community structure. Once a network is constructed, it is fairly easy to identify a lectin that explicitly binds to a certain glycan sequence by just selecting the lectin node of interest and its first neighbors in the network. The lectins in different communities show a dramatic difference in their glycan binding specificities. The current network-based approach should provide quick overall analysis and the use of glycan microarray data on the lectin-glycan interaction without time-consuming calculations.

Community Detection of the Lectin-glycan Interaction

We performed community detection of the lectin-glycan interaction network by using Mod-CSA [28], and compared the results with existing methods such as MCL [34], [35], MCODE [36] and greedy algorithm [32], [38]. The number of identified communities and the modularity values obtained by various community detection algorithms are shown in Table 4, Figure 4 and Figure 5.

thumbnail

Figure 4. Graphical representation of four communities identified by Mod-CSA is shown.

The figure provides an overall picture of the whole network with four main functional categories based on the p-value analysis.

doi:10.1371/journal.pone.0095480.g004
thumbnail

Figure 5. Communities generated by four methods are shown.

(a) Mod-CSA generated communities are shown. In each community, glycans nodes are represented by circles whereas the protein nodes are shown as diamonds. From the figure it can be seen that all the nodes in a network have been assigned to a community. Community 1 has PP2A1 as hub node where as Community 4 has two hub nodes, WGA1 and RCA. (b) Greedy algorithm generated communities are shown. The nodes are color coded as per the Mod-CSA result. Each of the first three communities (community 1 to 3) contain a hub node where as communities 4-6 have only a few nodes.(c) MCODE generated communities are shown. Many nodes are not clustered, and the three hubs are grouped into one community. (d) MCL generated communities are shown many nodes are not clustered at all. Hub nodes are not clustered with any other nodes.

doi:10.1371/journal.pone.0095480.g005
thumbnail

Table 4. A summary of various clustering methods tested in this work.

doi:10.1371/journal.pone.0095480.t004

From Table 4, Figure 4 & Figure 5a–d, it is clear that Mod-CSA [29] outperforms the other clustering methods in terms of the modularity score as well as the number of nodes left unclassified. The only method comparable to our modularity score of 0.37 obtained by Mod-CSA was the fast greedy algorithm [32], [38] with a modularity score of 0.30. The algorithm recognizes clusters by repetitively eliminating edges from the network and then checks again which nodes are still connected [59]. The method detected 6 communities with the largest community containing 223 protein nodes and 298 glycan nodes (community 1) whereas the three smallest communities consist of either 4 nodes (community 4) or 3 nodes (community 5 & 6) only (see Figure 5b).

To compare the biological significance of modules (communities) obtained by Mod-CSA and by the greedy algorithm, we calculated the numbers of statistically meaningful enriched clusters of lectins that bind to the same specific glycan. The glycan binding specificity of each protein node was identified either from the literature or from Uniprot database as described in the methods section, and the significance of each glycan specific clusters was assessed by calculating its p-value (p≤0.05). From Table 5, we observe that 44 statistically meaningful enriched clusters of lectins are identified with p-values ≤0.05. Whereas only 33 enriched clusters are identified by the greedy algorithm. This result suggests that many additional functionally related lectin clusters are identified by Mod-CSA, than detected by greedy algorithm.

thumbnail

Table 5. Lists of statistically meaningful enriched clusters (p≤0.05) of lectins binding to the identical glycan are shown.

doi:10.1371/journal.pone.0095480.t005

For example, the greedy algorithm failed to identify 15 glycan specific lectin clusters (shown in bold in Table 5) that were identified by Mod-CSA. On the contrary, 3 glycan specific clusters (shown in italic bold in Table 5) were not detected by Mod-CSA, which are found by the greedy algorithm result. Specifically, the greedy algorithm failed to identify all fucose specific lectins, while Mod-CSA [29] successfully detected almost all fucose specific lectins and grouped them in community 1. Similarly, the greedy algorithm identified only five mannose related specificities in community 3, which is the major mannose binding community detected by greedy algorithm. However, Mod-CSA recognized eight mannose related specificities in community 1.

We compared our method with other popular clustering algorithms such as MCODE [36] and MCL [34], [35]. MCODE method divided the network into a total of 23 clusters with the modularity score of −0.036. The largest cluster consists of 56 nodes whereas the smallest cluster contains only 4 nodes. However, only 3 clusters contain more than 10 protein nodes and they were further analyzed for enrichment of glycan specific lectin groups. The statistical analysis of these 3 clusters resulted in only 4 statistically meaningful lectin groups. From Figure 5c, we observe that a large number of single nodes (791) are not clustered into any groups. This is because MCODE identifies clusters of tightly connected nodes and does not intend to assign every node in the network to a cluster [59]. The main reason for this could be the fact that the MCODE algorithm is sensitive to noise in the network, particularly to false positive interactions [60]. Consequently, only a small number of strongly connected clusters are identified by MCODE and the rest of the nodes remain unclustered, which makes it hard to extract information from the network.

Among all four methods tested, the MCL algorithm performed worst in terms of its modularity value of −0.815. MCL detected 33 clusters with the largest cluster consisting of 340 nodes while the smallest cluster has 2 nodes (Figure 5d). Similar to MCODE, the MCL method detected only 3 clusters containing more than 10 protein nodes and many nodes (689) in the network were not assigned to any group, again making it difficult to interpret these unassigned nodes. Therefore, these unassigned nodes were left out for further analysis. The MCL method resulted in only 12 statistically significant glycan specific groups.

If the performances of MCL and MCODE are hindered by false positive interactions, MCL and MCODE may perform better with networks generated using only reliable data. To find out if the Mod-CSA method outperforms the other methods regardless of the amount of potentially false information, we performed the enriched cluster analysis on two additional networks generated using more stringent RFU criteria, RFU ≥10000 and RFU ≥20000 (see Table S2). The results remain same regardless of the RFU cutoff values used to generate the network. For example, the numbers of statistically significant glycan specific groups identified by Mod-CSA are 41 and 35 using RFU cutoff values of 10000 and 20000, respectively. However, the greedy algorithm provides 23 and 20 statistically significant glycan specific groups. Similarly, with the MCL method, 20 and 14 statistically significant glycan specific groups were identified (see Table S3). Surprisingly, MCODE detected no statistically significant glycan specific lectin groups from more stringent networks.

Finally? we compared the clusters obtained by Mod-CSA with random clusters. We divided the nodes into four random clusters, which have the same number of nodes with those detected by Mod-CSA. This process was iterated 20 times and the average number of statistically enriched glycan-specific groups detected by random clustering was compared with that by Mod-CSA. The maximum and minimum number of significantly enriched lectin groups was 11 and 1, respectively. On average, these 20 random permutations of clusters resulted in about 7 glycan-specific lectin groups having p-value ≤0.05 (see Table S4). A comparison of the number of significantly enriched lectin groups detected by the different clustering methods is shown in Figure 6. All these results demonstrate that Mod-CSA extracts more information than the other widely used clustering methods, and it can serve as a powerful tool for investigating the lectin-glycan interaction.

thumbnail

Figure 6. The number of statistically significant glycan-specific groups are shown for three networks generated with RFU cutoff values of 5000 (blue), 10000 (red), 20000 (green).

The random clusterings are generated using the four community results of Mod-CSA, and the average and the standard deviation is calculated from 20 runs.

doi:10.1371/journal.pone.0095480.g006

The Optimal Community Structure of the Lectin-glycan Interaction Network

It has been shown that Mod-CSA can provide globally optimal modularity partitioning of a network containing up to 2000 nodes [31]. Since our lectin-glycan network has 1119 nodes, we believe that the Mod-CSA result corresponds to the optimal grouping of the network in terms of its modularity. The optimal modularity grouping of lectins and glycans results in 4 communities with the modularity score of 0.37. We attempted to explore the relationship between all nodes within the same community on the basis of structure and function of each lectin and the type of glycan binding specificity. Each lectin node was assigned with its known glycan binding specificity, and the statistical significance of their grouping was assessed by calculating its p-value (p≤0.05) (see Table 5 and Figure 4). A brief description of each community is given below:

Community 1 (Fucose specific).

This is the largest community of the lectin-glycan network detected by Mod-CSA analysis and contains 168 protein nodes and 215 glycan nodes, respectively. This community is dominated by protein nodes with fucose specific lectins, such as ulex europaeus agglutinin I (UEA-I), aleuria aurantia lectin (AAL), ralstonia solanacearum lectin (RSL), etc. The fucose binding sites of RSL are very similar to those of previously reported five fucose-binding sites of AAL [61]. Fucose-containing xyloglucans are known to promote signaling consequences on plant tissues [62]. The other types of overrepresented lectins in this community have specificity for Galactose- and N-acetylgalactosamine binding with cell adhesion as their main function. The most common protein domains correspond to these galactose specific lectins are H_lectin (PFAM ID: PF09458) domain, which is involved in self/non-self recognition of cells through binding with carbohydrates [63], and Galactose-binding domain-like domain known as Discoidin domain (PFAM ID: PF00754), which is found in many blood coagulation factors. The galactose specific lectins in this community include agglutinin from Helix pomatia, Discoidin I and Discoidin II from Dictyostelium discoideum (Slime mold). Additionally, the unannotated lectins in this cluster such as 6RG, Tap1, Mubin1 show specificity for galactose or fucose sugars (see Table S5), which strongly indicates that these proteins are related to cell adhesion.

This community contains the top hub PP2A1 (1001943) with the largest node degree of 257. The other three PP2A1 nodes (1002090, 1002091 and 1002092) belong to community 2. The list of unique glycans that interact with these PP2A1 nodes are summarized in Table S6. From this table it can been seen that PP2A1 nodes show specificity for a diverse range of glycans such as GlcNAc, high-mannose N-glycans and sialic acid containing glycans. Recently, Beneteau et al., (2010) [64] in their glycan array experiments have shown that PP2A1 binds to different types of carbohydrates. This indicates the possibility that the phloem PP2 lectin plays roles in numerous functions, recognizing either endogenous glycoproteins or glycosylated receptors of pathogens. This diversity in glycan binding by PP2A1 could be attributed to the presence of several carbohydrate-binding sites in PP2A1 [64].

Community 2 (Galb1-3GalNAc specific).

This is the smallest community with 98 protein nodes and 133 glycan nodes. Community 2 is rich in N-acetylglucosamine and N-acetylgalactosamine binding lectins such as Wheat Germ Agglutinin (WGA), Griffonia simplicifolia II (GS-II), and Sclerotium rolfsii lectin (SRL). WGA belongs to a highly conserved family of chitin-binding lectins from cereals (Gramineae), such as rye, barley, rice and wheat [65]. Chitin, a polymer of β-1,4-N-acetylglucosamine is present in the cell wall of many fungi, in the exoskeleton and digestive tract of some insects, and in some nematodes [66]. Similarly, GS-II, also an N-acetylglucosamine-specific legume lectin, has insecticidal activity against cowpea weevil [67]. In contrast to WGA and GS-II, SRL displays strong binding to O-linked galactose-beta-1,3-N-acetylgalactosamine​,disaccharide (Thomsen Friedenreich antigen) similar to Agaricus bisporus lectin [68]. Similarly, the other N-acetylgalactosamine specific lectins in this group are involved in the binding of T-antigen structure Gal-beta1,3-GalNAc e.g. Agglutinin alpha chain (Jacalin alpha chain) from Artocarpus integer (Jack fruit) and Agglutinin alpha chain (MPA) from Maclura pomifera (Osage orange). Unannotated protein nodes are represented by lectins such as Protein PHLOEM PROTEIN 2-LIKE A1 (PP2A1) from Arabidopsis thaliana and Codium fragile lectin (CFT) from Codium fragile [(Dead man's fingers) (Green alga)]. PP2A1 is known to interact with diverse types of carbohydrates and may be involved in numerous recognition functions [64]. On the other hand, CFT shows preference for the a-anomer of GalNAc and recognizes GalNAca1 sequences as well as high affinity for the Forssman pentasaccharide and for Galb1->3GalNAc-a- [69], which is one of the overrepresented (p-value <0.05) glycan specific group in this community. Lists of unique glycans for PP2A1 and CFT nodes are summarized in Table S7.

Community 3 (Mannose specific).

Protein nodes in this group are dominantly mannose binding lectins and nine out of twelve statistically significant glycan groups are mannose specific. Many members of these mannose specific lectins have B_lectin (PFAM ID: PF01453) structural domain. The members of this family are mannose specific and belong to Bulb lectin super-family (Amaryllidaceae, Orchidaceae and Aliaceae).For example, Galanthus nivalis agglutinin (GNA), a mannose-specific lectin from snowdrop bulbs, is a tetrameric member of the family of Amaryllidaceae lectins that exhibit antiviral activity towards HIV [70]. Other mannose binding lectins in this group have Lectin_legB (PFAM ID: PF00139) structural domain and require metal ions like Ca and Mn ions for carbohydrate binding and cell-agglutinating activities. Examples include ConA and Garden pea lectin. The group also includes various high mannose binding lectins such as Hippeastrum hybrid lectin (HHL), Narcissus psuedo-narcissus agglutinin (NPA), Salt stress-induced protein, Allium sativum agglutinin (ASA), etc. Another mannose binding lectin in this group which has an antiviral activity is Cyanovirin-N (CV-N). The antiviral activity of CV-N is mediated through specific interactions with the viral surface envelope glycoproteins gp120 and gp41, as well as to high-mannose oligosaccharides found on the HIV envelope [71].

Other lectins that were grouped in this community for which we could not find the reported glycan specificity include Arum maculatun agglutinin (AMA), Caragana arborescens agglutinin (CAA), Colchicum autumnale lectin (CA), and Arisaema helleborifolium schott lectin (AHL). All these lectins also show high specificity for mannose sugars (Table S8). Overall the community consists of 147 protein nodes and 124 glycan nodes.

Community 4 (GalNAc specific).

From Table 5 it can be observed that this community is enriched in GalNAc specific lectins such as Datura stramonium agglutinin (DSA), Soybean agglutinin (SBA), Vicia villosa agglutinin (VVA), Bauhinia purpurea lectin (BPL), etc. These galactose specific lectins may play a significant role in cell-agglutinating activities e.g. VVA (Lectin B4) from Vicia villosa (Hairy vetch). Another galactose-specific lectin in this group is a legume lectin known as Erythrina cristagalli lectin (ECL) [72]. Although its function in the legume is unknown, it has been shown that ECL possesses hemagglutinating activity and it is believed to be mitogenic for human T lymphocytes [73]. A large number of plant and fungal proteins (e.g. solanaceous lectins of tomato and potato, plant endochitinases, the wound-induced proteins: hevein, win1 and win2, and the Kluyveromyces lactis killer toxin alpha subunit) that bind N-acetylglucosamine contain chitin-binding domain (PFAM ID: PF00187). These proteins might function as a defence against chitin containing pathogens, e.g. Chitin-binding lectin 1 of Solanum tuberosum (Potato). This community also includes lectins such as Macrolepiota procera agglutinin (MPA) and Laccaria bicolor lectin both of which show high specificity for complex GalNAc glycans (Table S9). This community consists of 100 protein and 134 glycan nodes.

Additionally, this community includes 2 out of three hub nodes identified in the lectin-glycan array network. One of the hubs represent protein node (1004763) for wheat germ agglutinin (WGA) from Triticum vulgaris (wheat), whereas the second node (1004668) represents Ricinus communis agglutinin (RCA) from Ricinus communis (castor bean). WGA is a stable homodimer protein and exhibits specificity for N-acetylneuraminic acid and N-acetylglucosamine (GlcNAc) sugars. The glycans for WGA hub node are summarized in Table S10 and it can be observed that almost all these glycans have GlcNAc group, while few others contain N-acetylneuraminic acid. Each monomeric unit of WGA consists of four domains (A–D) which can be further classified into “primary” (B and C domains) and “secondary” (A and D domains) binding sites showing dissimilar affinities for GlcNAc containing moieties [74]. These structural characteristics and the closeness of binding sites make WGA a worthy candidate to explore multivalent protein-carbohydrate interactions and to assess the impact of structural modifications of glycoclusters [75]. These multivalent interactions are favorable as compared to monomeric ones and are frequently employed by nature to control an array of diverse biological processes [76].

RCA as well as ECL recognize carbohydrate chains with non-reducing terminal β-d-galactose (Galβ) and show preference to Galβ1-4GlcNAc instead of Galβ1-3GlcNAc sequence [77], [78]. The diverse types of glycans including Galβ1-4GlcNAc that interact with RCA hub node are listed in Table S11. The table also shows many Neu5Aca2-6Galb1 sugars having large RFU values.?RCA is a glycoprotein from seeds of castor plants and one of the most important applied lectins that have been widely used as a tool to study cell surfaces and to purify glycans [79]. RCA promotes binding and agglutination of polysaccharides and glycoproteins in addition to liposomes and micelles containing glycolipids with galactosyl residues [80], [81]. Furthermore, the specificities of interactions of RCA with neutral and sialylated oligosaccharides have been well established and is consistent with our results as summarized in Table S11 [82].

The current community-based network study of the lectin-glycan microarray data provides not only a quick and systematic analysis of lectin specificities, but also global organization and grouping of biologically related lectins along with their binding partners (glycans). Such information will be vital to identify lectins that bind to particular glycan structures or to catalogue lectins according to the similarity in specificities. Another important significance of the community-based network analysis is the identification of a novel lectin and the initial guess about its specificity. For this, a sequence database should be constructed for each community identified and a target lectin under investigation should be fed into the databases to get an idea about the structural/functional role of the query lectin and the type of glycans it might bind to. This approach will be more practical when the communities have a large number of different lectins and might help in determining the glycan binding nature of a given lectin. There are many network-based protein function prediction methods along with approaches utilizing structural or sequence information of proteins. Recently, when dealing with a protein-protein-interaction network, it has been shown that more accurate protein function prediction results were obtained by modularity based community detection of the network. The current study provides the first attempt to study lectin-carbohydrate interactions via community detection of a network.

Conclusion

We have constructed a bipartite lectin-glycan interaction network from the collection of glycan microarray data. The network itself provides a quick and global view of the lectin-glycan interaction from which hub proteins are identified. We find that the hub proteins match well with the characteristics of known biological relevance. Using Mod-CSA, a recently developed efficient community detection method, 4 modules are identified. The clustering results are shown to be biologically more meaningful than those obtained by other widely used methods. Most significantly, 44 statistically significant glycan specific groups are identified including fucose and mannose binding ones, some of which could not be detected by alternative methods. Even with more strict RFU cut-offs, clusters generated by Mod-CSA provide consistently better results as compared to other methods. We provide overall analysis of 4 communities identified in the lectin-glycan microarray network. We also show how multiple lectins from the same plant, such as Sambugus nigra (SNA-I and SNA-II) are grouped into different communities based on their glycan binding specificities. The network study provides a framework to get a broad picture of data containing many interacting components. These capabilities of a community-based network analysis allow researchers to explore, analyze and compare a variety of proteins and glycans within the context of modules/communities identified in the network. We expect that this will trigger interest in the prediction of protein-carbohydrate interactions using biological networks and will have wider applications as additional glycan binding proteins are identified. The method can also be applied to study other types of lectins as well as other interaction networks.

Supporting Information

Table S1.

List of all protein nodes, their clusters and reported specificity in the lectin-glycan network.

doi:10.1371/journal.pone.0095480.s001

(XLS)

Table S2.

The list of meaningful glycan-specific groups and their P-values detected by Mod-CSA and greedy algorithm (GLAY) at RFU ≥10000 and RFU ≥20000.

doi:10.1371/journal.pone.0095480.s002

(XLS)

Table S3.

The list of meaningful glycan-specific groups and their P-values detected by MCL and MCODE at RFU ≥5000, RFU ≥10000 and RFU ≥20000.

doi:10.1371/journal.pone.0095480.s003

(XLS)

Table S4.

List of randomly identified statistically significant glycan-specific groups.

doi:10.1371/journal.pone.0095480.s004

(DOCX)

Table S5.

List of unique galactose and fucose sugars that interact with unannotated 6RG, Tap1, and Mubin at RFU ≥5000 in the lectin-glycan network.

doi:10.1371/journal.pone.0095480.s005

(XLS)

Table S6.

List of diverse glycans that interact with the hub PP2A1 at RFU ≥5000 in the lectin-glycan network.

doi:10.1371/journal.pone.0095480.s006

(XLS)

Table S7.

Lists of unique glycans for PP2A1 and CFT at RFU ≥5000 in the lectin-glycan network.

doi:10.1371/journal.pone.0095480.s007

(XLS)

Table S8.

List of unique glycans for unannotated lectins Arum maculatun agglutinin (AMA), Caragana arborescens agglutinin (CAA), Colchicum autumnale lectin (CA), and Arisaema helleborifolium schott lectin (AHL) that show high specificity for mannose sugars at RFU ≥5000 in the lectin-glycan network.

doi:10.1371/journal.pone.0095480.s008

(XLS)

Table S9.

List of complex glycans that show high specificity for lectins such as Macrolepiota procera agglutinin (MPA) and Laccaria bicolor lectin.

doi:10.1371/journal.pone.0095480.s009

(XLS)

Table S10.

List of diverse glycans that interact with the hub WGA at RFU ≥5000 in the lectin-glycan network.

doi:10.1371/journal.pone.0095480.s010

(XLS)

Table S11.

List of diverse glycans that interact with the hub RCA at RFU ≥5000 in the lectin-glycan network.

doi:10.1371/journal.pone.0095480.s011

(XLS)

Author Contributions

Conceived and designed the experiments: AM Juyong Lee Jooyoung Lee. Performed the experiments: AM Juyong Lee. Analyzed the data: AM Juyong Lee Jooyoung Lee. Contributed reagents/materials/analysis tools: AM Juyong Lee Jooyoung Lee. Wrote the paper: AM Juyong Lee Jooyoung Lee.

References

  1. 1. Varki A, Cummings RD, Esko JD, Freeze HH, Stanley P, et al.. (1999) Essentials of Glycobiology. Cold Spring Harbor Laboratory Press, New York.
  2. 2. Shriver Z, Raguram S, Sasisekharan R (2004) Glycomics: a pathway to a class of new and improved therapeutics. Nat Rev Drug Discov 3: 863–873. doi: 10.1038/nrd1521
  3. 3. Perrimon N, Bernfield M (2000) Specificities of heparin sulphate proteoglycans in developmental processes. Nature 404: 725–728. doi: 10.1038/35008000
  4. 4. Ioffe E, Stanley P (1994) Mice lacking N-acetylglucosaminyltransferase activity die at mid-gestation, revealing an essential role for complex or hybrid N-linked carbohydrates. Proc Natl Acad Sci 91: 728–732. doi: 10.1073/pnas.91.2.728
  5. 5. Jin L, Abrahams JP, Skinner R, Petitou M, Pike RN, et al. (1997) The anticoagulant activation of antithrombin by heparin. Proc Natl Acad Sci 94: 14683–14688. doi: 10.1073/pnas.94.26.14683
  6. 6. Fu X, Albermann C, Jiang J, Liao J, Zhang C, et al. (2003) Antibiotic optimization via in vitro glycorandomization. Nat Biotechnol 21: 1467–1469. doi: 10.1038/nbt909
  7. 7. Freeze HH (2006) Genetic defects in the human glycome. Nat Rev Genet 7: 537–551. Erratum in: Nat Rev Genet 7: 660. doi: 10.1038/nrg1937
  8. 8. Feizi T, Fazio F, Chai W, Wong CH (2003) Carbohydrate microarrays - a new set of technologies at the frontiers of glycomics. Curr Opin Struct Biol 13: 637–645. doi: 10.1016/j.sbi.2003.09.002
  9. 9. Imberty A, Lortat-Jacob H, Pérez S (2007) Structural view of glycosaminoglycan-protein interactions. Carbohydr Res 342: 430–439. doi: 10.1016/j.carres.2006.12.019
  10. 10. Sharon N, Lis H (1972) Lectins: cell-agglutinating and sugar-specific proteins. Science 177: 949–959. doi: 10.1126/science.177.4053.949
  11. 11. McCoy JP, Varani J, Goldstein IJ (1983) Enzyme-linked lectin assay (ELLA): use of alkaline phosphatase-conjugated Griffonia simplicifolia B4 isolectin for the detection of alpha-D galactopyranosyl end groups. Anal Biochem 130: 437–444. doi: 10.1016/0003-2697(83)90613-9
  12. 12. Duverger E, Frison N, Roche AC, Monsigny M (2003) Carbohydrate-lectin interactions assessed by surface plasmon resonance. Biochimie 85: 167–179. doi: 10.1016/s0300-9084(03)00060-9
  13. 13. Dam TK, Brewer CF (2002) Thermodynamic studies of lectin-carbohydrate interactions by isothermal titration calorimetry. Chem Rev 102: 387–429. doi: 10.1021/cr000401x
  14. 14. Park S, Lee MR, Shin I (2007) Fabrication of carbohydrate chips and their use to probe protein-carbohydrate interactions. Nat Protoc 2: 2747–2758. doi: 10.1038/nprot.2007.373
  15. 15. Taroni C, Jones S, Thornton JM (2000) Analysis and prediction of carbohydrate binding sites. Protein Eng 13: 89–98. doi: 10.1093/protein/13.2.89
  16. 16. Shionyu-Mitsuyama C, Shirai T, Ishida H, Yamane T (2003) An empirical approach for structure-based prediction of carbohydrate binding sites on proteins. Protein Eng 16: 467–478. doi: 10.1093/protein/gzg065
  17. 17. Malik A, Ahmad S (2007) Sequence and structural features of carbohydrate binding in proteins and assessment of predictability using a neural network. BMC Struct Biol 7: 1.
  18. 18. Nassif H, Al-Ali H, Khuri S, Keirouz W (2009) Prediction of protein-glucose binding sites using support vector machines. Proteins 77: 121–132. doi: 10.1002/prot.22424
  19. 19. Kulharia M, Bridgett SJ, Goody RS, Jackson RM (2009) InCa-SiteFinder: a method for structure-based prediction of inositol and carbohydrate binding sites on proteins. J Mol Graph Model 28: 297–303. doi: 10.1016/j.jmgm.2009.08.009
  20. 20. Malik A, Firoz A, Jha V, Ahmad S (2010) PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools. Adv Bioinformatics 436036.
  21. 21. Agarwal S, Mishra NK, Singh H, Raghava GP (2011) Identification of mannose interacting residues using local composition. PLoS One 6: e24039. doi: 10.1371/journal.pone.0024039
  22. 22. Park S, Shin I (2002) Fabrication of carbohydrate chips for studying protein-carbohydrate interactions. Angew Chem Int Ed Engl 41: 3180–3182. doi: 10.1002/1521-3773(20020902)41:17<3180::aid-anie3180>3.0.co;2-s
  23. 23. Wang D, Liu S, Trummer BJ, Deng C, Wang A (2002) Carbohydrate microarrays for the recognition of cross-reactive molecular markers of microbes and host cells. Nat Biotechnol 20: 275–281. doi: 10.1038/nbt0302-275
  24. 24. Fukui S, Feizi T, Galustian C, Lawson AM, Chai W (2002) Oligosaccharide microarrays for high-throughput detection and specificity assignments of carbohydrate-protein interactions. Nat Biotechnol 20: 1011–1017. doi: 10.1038/nbt735
  25. 25. Houseman BT, Mrksich M (2002) Carbohydrate Arrays for the Evaluation of Protein Binding and Enzyme Activity. Chem Biol 9: 443–454. doi: 10.1016/s1074-5521(02)00124-2
  26. 26. Porter A, Yue T, Heeringa L, Day S, Suh E, et al. (2010) A motif-based analysis of glycan array data to determine the specificities of glycan-binding proteins. Glycobiology 20: 369–380. doi: 10.1093/glycob/cwp187
  27. 27. Smith DF, Song X, Cummings RD (2010) Use of glycan microarrays to explore specificity of glycan-binding proteins. Methods Enzymol 480: 417–444. doi: 10.1016/s0076-6879(10)80033-3
  28. 28. Maupin KA, Liden D, Haab BB (2012) The fine specificity of mannose-binding and galactose-binding lectins revealed using outlier motif analysis of glycan array data. Glycobiology 22: 160–169. doi: 10.1093/glycob/cwr128
  29. 29. Lee J, Gross SP, Lee J (2012) Mod-CSA: Modularity optimization by conformational space annealing. Phys Rev E Stat Nonlin Soft Matter Phys 85: 056702.
  30. 30. Lee J, Lee J (2013) Hidden information revealed by optimal community structure from a protein-complex bipartite network improves protein function prediction. PLoS One 8: e60372. doi: 10.1371/journal.pone.0060372
  31. 31. Lee J, Gross SP, Lee J (2013) Improved network community structure improves function prediction. Sci Rep 3: 2197. doi: 10.1038/srep02197
  32. 32. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69: 026113. doi: 10.1103/physreve.69.026113
  33. 33. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T (2011) Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27: 431–432. doi: 10.1093/bioinformatics/btq675
  34. 34. van Dongen S (2000a) Graph Clustering by Flow Simulation. Unpublished doctoral dissertation. Centre for Mathematics and Computer Science, University of Utrecht, The Netherlands.
  35. 35. van Dongen S (2000b) MCL - an algorithm for clustering graphs. Available: http://micans.org/mcl/.
  36. 36. Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4: 2.
  37. 37. Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, et al. (2011) clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 12: 436. doi: 10.1186/1471-2105-12-436
  38. 38. Su G, Kuchinsky A, Morris JH, States DJ, Meng F (2010) GLay: community structure analysis of biological networks. Bioinformatics 26: 3135–3137. doi: 10.1093/bioinformatics/btq596
  39. 39. Miyagawa S, Maeda A, Takeishi S, Ueno T, Usui N, et al. (2013) A lectin array analysis for wild-type and α-Gal-knockout pig islets versus healthy human islets. Surg Today 43: 1439–1447. doi: 10.1007/s00595-013-0569-6
  40. 40. Kletter D, Cao Z, Bern M, Haab B (2013) Determining lectin specificity from glycan array data using motif segregation and GlycoSearch software. Curr Protoc Chem Biol 5: 157–169. doi: 10.1002/9780470559277.ch130028
  41. 41. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res 33: D154–159. doi: 10.1093/nar/gki070
  42. 42. Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, et al. (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40: D306–312. doi: 10.1093/nar/gkr948
  43. 43. Albert R (2005) Scale-free networks in cell biology. J Cell Sci. 118: 4947–4957. doi: 10.1242/jcs.02714
  44. 44. Wu XR, Zhu Y, Li Y (2005) Analyzing protein interaction networks via random graph model. Int. J. Inf. Technol 11: 125–132.
  45. 45. Higurashi M, Ishida T, Kinoshita K (2008) Identification of transient hub proteins and the possible structural basis for their multiple interactions. Protein Sci 17: 72–78. doi: 10.1110/ps.073196308
  46. 46. Jeong H, Mason SP, Barabási AL, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411: 41–42. doi: 10.1038/35075138
  47. 47. Shibuya N, Goldstein IJ, Broekaert WF, Nsimba-Lubaki M, Peeters B, et al. (1987) The elderberry (Sambucus nigra L.) bark lectin recognizes the Neu5Ac(alpha 2–6)Gal/GalNAc sequence. J Biol Chem 262: 1596–1601.
  48. 48. Goldstein IJ, Hollerman CE, Smith EE (1965) Protein-carbohydrate interaction. II. Inhibition studies on the interaction of concanavalin A with polysaccharides. Biochemistry 4: 876–883. doi: 10.1021/bi00881a013
  49. 49. Poretz RD, Goldstein IJ (1970) An examination of the topography of the saccharide binding sites of concanavalin A and of the forces involved in complexation. Biochemistry 9: 2890–2896. doi: 10.1021/bi00816a021
  50. 50. Goldstein IJ, Reichert CM, Misaki A (1974) Interaction of concanavalin A with model substrates. Ann N Y Acad Sci 234: 283–296. doi: 10.1111/j.1749-6632.1974.tb53040.x
  51. 51. Kaku H, Peumans WJ, Goldstein IJ (1990) Isolation and characterization of a second lectin (SNA-II) present in elderberry (Sambucus nigra L.) bark. Arch Biochem Biophys 277: 255–262. doi: 10.1016/0003-9861(90)90576-k
  52. 52. Shahidi-Noghabi S, Van Damme EJ, Iga M, Smagghe G (2010) Exposure of insect midgut cells to Sambucus nigra L. agglutinins I and II causes cell death via caspase-dependent apoptosis. J Insect Physiol 56: 1101–1107. doi: 10.1016/j.jinsphys.2010.03.012
  53. 53. Van Damme EJ, Barre A, Rougé P, Van Leuven F, Peumans WJ (1996) The NeuAc(alpha-2,6)-Gal/GalNAc-binding lectin from elderberry (Sambucus nigra) bark, a type-2 ribosome-inactivating protein with an unusual specificity and structure. Eur J Biochem 235: 128–137. doi: 10.1111/j.1432-1033.1996.00128.x
  54. 54. Reeke GN Jr, Becker JW, Edelman GM (1975) The Covalent And Three-Dimensional Structure Of concanavalin A. IV. Atomic coordinates, hydrogen bonding, and quaternary structure. J Biol Chem 250: 1525–1547.
  55. 55. Hardman KD, Ainsworth CF (1972) Structure of concanavalin A at 2.4-A resolution. Biochemistry 11: 4910–4919. doi: 10.1021/bi00776a006
  56. 56. Naismith JH, Field RA (1996) Structural basis of trimannoside recognition by concanavalin A. J Biol Chem. 271: 972–976. doi: 10.1074/jbc.271.2.972
  57. 57. Gupta D, Dam TK, Oscarson S, Brewer CF (1997) Thermodynamics of lectin-carbohydrate interactions. Binding of the core trimannoside of asparagine-linked carbohydrates and deoxy analogs to concanavalin A. J Biol Chem 272: 6388–6392. doi: 10.1074/jbc.272.10.6388
  58. 58. Moothoo DN, Naismith JH (1999) A general method for co-crystallization of concanavalin A with carbohydrates. Acta Crystallogr D Biol Crystallogr 55: 353–355. doi: 10.1107/s0907444998008919
  59. 59. Koh GC, Porras P, Aranda B, Hermjakob H, Orchard SE (2012) Analyzing protein-protein interaction networks. J Proteome Res 11: 2014–2031. doi: 10.1021/pr201211w
  60. 60. Brohée S, van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7: 488. doi: 10.1186/1471-2105-7-488
  61. 61. Wimmerova M, Mitchell E, Sanchez JF, Gautier C, Imberty A (2003) Crystal structure of fungal lectin: six-bladed beta-propeller fold and novel fucose recognition mode for Aleuria aurantia lectin. J Biol Chem 278: 27059–27067. doi: 10.1074/jbc.m302642200
  62. 62. Darvill A, Augur C, Bergmann C, Carlson RW, Cheong JJ, et al. (1992) Oligosaccharins–oligosaccharides that regulate growth, development and defence responses in plants. Glycobiology 2: 181–198. doi: 10.1093/glycob/2.3.181
  63. 63. Sanchez JF, Lescar J, Chazalet V, Audfray A, Gagnon J, et al. (2006) Biochemical and structural analysis of Helix pomatia agglutinin. A hexameric lectin with a novel fold. J Biol Chem 281: 20171–20180. doi: 10.1074/jbc.m603452200
  64. 64. Beneteau J, Renard D, Marché L, Douville E, Lavenant L, et al. (2010) Binding properties of the N-acetylglucosamine and high-mannose N-glycan PP2-A1 phloem lectin in Arabidopsis. Plant Physiol 153: 1345–1361. doi: 10.1104/pp.110.153882
  65. 65. Raikhel NV, Lee HI, Broekaert WF (1993) Structure and function of chitin-binding proteins. Annu. Rev. Plant Physiol. Plant Mol. Biol 44: 591–615. doi: 10.1146/annurev.pp.44.060193.003111
  66. 66. Lerner DR, Raikhel NV (1992) The gene for stinging nettle lectin (Urtica dioica agglutinin) encodes both a lectin and a chitinase. J Biol Chem 267: 11085–11091. Erratum in: J Biol Chem 1992 267: 22694.
  67. 67. Zhu K, Huesing JE, Shade RE, Bressan RA, Hasegawa PM, et al. (1996) An insecticidal N-acetylglucosamine-specific lectin gene from Griffonia simplicifolia (Leguminosae). Plant Physiol 110: 195–202. doi: 10.1104/pp.110.1.195
  68. 68. Sathisha GJ, Prakash YK, Chachadi VB, Nagaraja NN, Inamdar SR, et al. (2008) X-ray sequence ambiguities of Sclerotium rolfsii lectin resolved by mass spectrometry. Amino Acids 35: 309–320. doi: 10.1007/s00726-007-0624-y
  69. 69. Wu AM, Song SC, Chang SC, Wu JH, Chang KS, et al. (1997) Further characterization of the binding properties of a GalNAc specific lectin from Codium fragile subspecies tomentosoides. Glycobiology 7: 1061–1066. doi: 10.1093/glycob/7.8.1061
  70. 70. Wright CS, Hester G (1996) The 2.0 A structure of a cross-linked complex between snowdrop lectin and a branched mannopentaose: evidence for two unique binding modes. Structure 4: 1339–1352. doi: 10.1016/s0969-2126(96)00141-4
  71. 71. Wlodawer A, Botos I (2003) Cyanovirin-N: a sugar-binding antiviral protein with a new twist. Cell Mol Life Sci 60: 277–287. doi: 10.1007/s000180300023
  72. 72. Turton K, Natesh R, Thiyagarajan N, Chaddock JA, Acharya KR (2004) Crystal structures of Erythrina cristagalli lectin with bound N-linked oligosaccharide and lactose. Glycobiology 14: 923–929. doi: 10.1093/glycob/cwh114
  73. 73. Iglesias JL, Lis H, Sharon N (1982) Purification and properties of a D-galactose/N-acetyl-D-galactosamine-spe​cificlectin from Erythrina cristagalli. Eur J Biochem 123: 247–252. doi: 10.1111/j.1432-1033.1982.tb19760.x
  74. 74. Wright CS (1992) Crystal structure of a wheat germ agglutinin/glycophorin-sialoglycopeptide receptor complex. Structural basis for cooperative lectin-cell binding. J Biol Chem 267: 14345–14352.
  75. 75. Fiore M, Berthet N, Marra A, Gillon E, Dumy P, et al. (2013) Tetravalent glycocyclopeptide with nanomolar affinity to wheat germ agglutinin. Org Biomol Chem 11: 7113–7122. doi: 10.1039/c3ob41203b
  76. 76. Masaka R, Ogata M, Misawa Y, Yano M, Hashimoto C, et al. (2010) Molecular design of N-linked tetravalent glycosides bearing N-acetylglucosamine, N,N'-diacetylchitobiose and N-acetyllactosamine: Analysis of cross-linking activities with WGA and ECA lectins. Bioorg Med Chem 18: 621–629. doi: 10.1016/j.bmc.2009.12.006
  77. 77. Itakura Y, Nakamura-Tsuruta S, Kominami J, Sharon N, Kasai K, et al. (2007) Systematic comparison of oligosaccharide specificity of Ricinus communis agglutinin I and Erythrina lectins: a search by frontal affinity chromatography. J Biochem 142: 459–469. doi: 10.1093/jb/mvm153
  78. 78. Tateno H, Mori A, Uchiyama N, Yabe R, Iwaki J, et al. (2008) Glycoconjugate microarray based on an evanescent-field fluorescence-assisted detection principle for investigation of glycan-binding proteins. Glycobiology 18: 789–798. doi: 10.1093/glycob/cwn068
  79. 79. Wu AM, Wu JH, Singh T, Lai LJ, Yang Z, et al. (2006) Recognition factors of Ricinus communis agglutinin 1 (RCA(1)). Mol Immunol 43: 1700–1715. doi: 10.1016/j.molimm.2005.09.008
  80. 80. Kawaguchi T, Tagawa K, Senda F, Matsunaga F, Kitano H (1999) Recognition of Amphiphiles with Many Pendent Galactose Residues by Ricinus communis Agglutinin. J. Colloid Interface Sci 210: 290–295. doi: 10.1006/jcis.1998.5976
  81. 81. Cartellieri S, Helmholz H, Niemeyer B (2001) Preparation and evaluation of Ricinus communis agglutinin affinity adsorbents using polymeric supports. Anal Biochem 295: 66–75. doi: 10.1006/abio.2001.5177
  82. 82. Wang Y, Yu G, Han Z, Yang B, Hu Y, et al. (2011) Specificities of Ricinus communis agglutinin 120 interaction with sulfated galactose. FEBS Lett 585: 3927–3934. doi: 10.1016/j.febslet.2011.10.035