Information Indices with High Discriminative Power for Graphs

Matthias Dehmer; Martin Grabner; Kurt Varmuza

doi:10.1371/journal.pone.0031214

Abstract

In this paper, we evaluate the uniqueness of several information-theoretic measures for graphs based on so-called information functionals and compare the results with other information indices and non-information-theoretic measures such as the well-known Balaban index. We show that, by employing an information functional based on degree-degree associations, the resulting information index outperforms the Balaban index tremendously. These results have been obtained by using nearly 12 million exhaustively generated, non-isomorphic and unweighted graphs. Also, we obtain deeper insights on these and other topological descriptors when exploring their uniqueness by using exhaustively generated sets of alkane trees representing connected and acyclic graphs in which the degree of a vertex is at most four.

Citation: Dehmer M, Grabner M, Varmuza K (2012) Information Indices with High Discriminative Power for Graphs. PLoS ONE 7(2): e31214. https://doi.org/10.1371/journal.pone.0031214

Editor: Dongxiao Zhu, Wayne State University, United States of America

Received: October 14, 2011; Accepted: January 4, 2012; Published: February 29, 2012

Copyright: © 2012 Dehmer et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: Matthias Dehmer, Martin Grabner and Kurt Varmuza thank the Austrian Science Funds for supporting this work (project P22029-N13). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

To quantify the topology of networks, numerous topological descriptors, which are also often referred to as graph measures or indices, have been developed [1]–[7]. A property thereof called the uniqueness, discriminative power or degeneracy has been investigated extensively in mathematical chemistry and structure-oriented drug design in the context of characterizing the structure of molecules quantitatively. In general, a descriptor is called degenerate if it possesses the same value for more than one graph. In this paper our main task is to examine the extent to which topological indices are degenerate.

We briefly review the most important contributions to tackle this problem, and start with a classical contribution due to Bonchev et al. [8], [9]. They proposed the so-called magnitude-based information indices for improving the discriminative power of other classical descriptors for alkane trees [8] and isomers [9]. Alkane trees are connected and acyclic graphs in which the degree of a vertex is at most four [10]. Following this, Raychaudhri et al. [11] analyzed the discriminative power of information-theoretic measures based on distances for chemical graphs containing one ring. Konstantinova et al. [12] explored the uniqueness of various information-theoretic and non-information-theoretic measures by using polycyclic structures representing cata-condensed benzenoid hydrocarbons. As a result, the Balaban index (see equation 20), the sum of local vertex entropies due to Konstantinova [12], [13] and the magnitude-based information indices turned out to be unique for this class of graphs; see [12]. However, note that the sizes of the corresponding sets , denoted by , were rather small, . Diudea et al. [14] recently explored a novel super-index based on shell matrices and polynomials. By applying this index to the heterogeneous graph database MS2265 [15] containing 2265 non-isomorphic skeleton graphs, inferred from chemical compounds, and to chemical isomers, it turned out that this index does not have any degeneracy [14]. Other results obtained when applying further topological descriptors to chemical graph databases can be also found in [14]. Hu and Xu [16] applied an index using layer matrices and powers of extended adjacency matrices to over two million weighted alkane isomers. The index was unique for all graph classes used [16], but we point out that the developed index is based on using bond types and 3D information.

In order to underpin the practical importance of exploring uniqueness, it seems reasonable that an appropriate graph measure to characterize the structure of networks quantitatively should be able to discriminate graphs properly (e.g., when slightly changing the structure of a network). Note that this problem has already been discussed in the context of complex networks; see [17]. As to applications thereof, Dehmer et al. [15] have already outlined that unique measures can serve as candidates for calculating the identification codes of networks (e.g., chemical structures), which could be used to perform fast structure searches in large databases. Also, such highly discriminating measures representing graph invariants (the measured value is invariant under graph isomorphisms [10]) can be useful to tackle the graph isomorphism problem, because, if the values of two graphs with the same number of vertices are different, they must be non-isomorphic. Hence, such indices could be employed to tackle the graph isomorphism problem in large databases, as the computational complexity of the measures is polynomial. That means instead of performing a thorough isomorphism test which may be computationally costly, highly unique graph measures could be used to filter out non-isomorphic graphs. Note that the time complexity of some of these measures has already been discussed in [15].

The main contribution of this paper is to evaluate the discriminative power of selected topological indices in the context of complex networks, i.e., graphs that are neither regular nor random [18]. We applied several information-theoretic and non-information-theoretic measures, such as the Balaban index [19], to nearly 12 million exhaustively generated, non-isomorphic and unweighted graphs with the same number of vertices (see ‘Numerical results and interpretation’). Importantly, we only use unweighted graphs in this study, as it poses an extra challenge to the underlying descriptors to discriminate such graphs on a large scale. We emphasize that the Balaban index has often been referred to as one of the most discriminative indices (see e.g. [20]), as it is powerful when applied to several classes of isomers and alkane trees. Our study highlights the limitations of the Balaban index and other topological descriptors in terms of their ability to discriminate non-isomorphic graphs uniquely.

We prove that one of the information indices due to Dehmer et al. [15], [21], which uses the information functional based on degree-degree associations, outperforms the Balaban index tremendously when these measures are applied to exhaustively generated graphs. We also employ other information measures for graphs using so-called information functionals that have been developed by Dehmer et al. [15], [21]. The discriminative power of some of these information measures and classical ones has already been evaluated in [22] specifically for chemical graphs possessing structural constraints. By contrast, we perform a large-scale study to compare the discriminative power of these information measures by employing three information functionals (see equations 7, 8, and 18) and non-information-theoretic indices such as the Balaban index using exhaustively generated graphs without structural constraints. The discriminative power by employing these particular information functionals and Balaban index has not yet been investigated on a large scale.

The results can be interpreted as an attempt to evaluate the uniqueness of quantitative graph measures in the context of complex networks. To the best of our knowledge, very little work has so far been done to tackle this problem. One exception is the work of Kim et al. [17], who evaluated the discriminative power of graph complexity measures that were developed in the context of network physics. As a result, most of the complexity measures proposed in [17] turned out to show little discriminative power.

This paper is organized as follows. In the section ‘Topological descriptors’ we briefly recall the definitions of the information-theoretic measures due to Dehmer et al. and the other graph measures that we are going to use. The ‘Data and software’ section describes the datasets and sketches the steps to calculate the topological descriptors. In ‘Numerical results and Interpretation’, we present and interpret the numerical results when evaluating the discriminative power of the measures. This includes a statistical analysis to investigate the dependence of the uniqueness of the Balaban index and on the sample size by using exhaustively generated graphs with 10 vertices. The paper finishes with a ‘Summary and conclusion’.

Methods

Topological Descriptors

In this section, we briefly recall the definition of the information measures [4], [15], [21] that we are going to use in this study. Further, we outline the concept of distance-based descriptors, including the well-known Balaban index. In summary, Table 1 gives an overview of the descriptors that we use.

Download:

Table 1. The topological indices used and their symbols.

https://doi.org/10.1371/journal.pone.0031214.t001

Information Indices.

To start, we point out that, besides empirical properties of information measures for graphs [1], [4], [15], [21] (such as determining correlations between the measures [1]), mathematical problems (such as proving various upper and lower bounds of the measures) have also been explored; see [23], [24]. Note that the correlation ability between two graph measures generally relates to the problem of whether they capture structural information similarly [1], [9]. The so-called implicit information inequalities have been investigated extensively in [21], [25], [26]. Also, the class of graph entropy measures obtained by using certain information functionals based on the metric properties of graphs (such as the neighborhoods of atoms) has been used to solve problems in quantitative structure–activity relationships (QSARs) and quantitative structure–property relationships (QSPRs) [27]. In particular, Dehmer et al. [28] classified the mutagenicity of molecules by using these measures and employing supervised learning techniques.

Let be an arbitrary, finite, and unweighted graph; denotes the number of vertices and the number of edges, respectively. Throughout this paper, we use the symbol to express the cardinality (also called the size) of a set . We denote by the diameter of ; see [29]. The abstract information functionals [21] play a critical role when defining information measures on graphs. Based on these functionals, vertex probabilities [21](1)have been assigned to each particular vertex of . This makes the resulting measure independent of determining partitions of graph invariants [1], [8], [30], [31], which might be computationally difficult to obtain. By definition,(2)and therefore forms a probability distribution. Using this approach and recalling Shannon's entropy [32] defined by(3)the families of information measures(4)(5)have been developed [4], [15], [21]. These measures are families of entropic measures representing the structural information content of . Here is a scaling constant, is the mean entropy of , and its information distance between maximum entropy and .

In our analysis, we define three distinct functionals , , and , and the relative information measures , , and [4], [5], [21]. To define , we first define the -sphere of a vertex by [21](6) are just the -sphere cardinalities. In general, is the shortest distance between the vertices ; see [33]. Then,(7)To define , the pathlengths for of the local information graph starting from a particular vertex have been used; see [21] for its detailed definition. For example, is the sum of all pathlengths starting from by inducing shortest paths for . We obtain(8)Finally, we define (see [34]), let be an undirected and unweighted graph, and set , , . For , we define the sets of shortest paths [34](9)(10)(11)and the corresponding degree sequences [34](12)(13)(14)The quantities [34](15)(16)(17)have been used to define the information functional ; see equation 18. As we employ the differences , the resulting graph entropies and have been called degree–degree association indices; see [34]. Now, has been defined by [34](18)We see that is well defined for any . Since , and as well as the resulting entropies are parametric, we need to choose the coefficients for weighting the structural differences or characteristics of a graph. Note that the must be chosen such that at least two coefficients are distinct. This includes the parameter settings, e.g.,(19)which have already been used in [15]. Other configurations of the have also been investigated to determine the structural complexity of chemical structures meaningfully [15].

Distance-Based Topological Descriptors.

Numerous topological descriptors have been explored by employing distances in a graph [7], [19], [29]. Seminal work was done by Skorobogatov and Dobrynin [29], who developed a theory on the metric properties of graphs. Also, several distance-based graph measures have been developed and analyzed where these indices have shown that distances in graphs capture significant information when applied in QSAR/QSPR; see [1], [7], [11], [19], [27].

We recall the definition of the Balaban index [7], [19] in detail as we place emphasis on comparing its discriminative power with , , and on a large scale by using exhaustively generated graphs. The names and symbols of the remaining descriptors used in this study can be found in Table 1. For their formal definitions, see [1], [2], [7], [27].

Now, we define the distance matrix [35] of a graph as . For each vertex , denotes the distance sum (row or column sum) obtained by adding the entries in the corresponding row or column of the distance matrix . In addition, is the cyclomatic number [36]. Then, the Balaban index is defined by [19](20)

Results

Data and Software

Let us now state the definitions and generation procedure of the graphs for performing our analysis.

Definition 1 is the set of all exhaustively generated non-isomorphic and connected graphs with vertices.

Practically, these sets have been generated by using the program geng from the Nauty package [37]. In this study we use the classes and obtain their cardinalities as follows: , , , , , and . These numbers are in accordance with the results due to McKay [37], [38].

Definition 2 is the set of all exhaustively generated non-isomorphic alkane trees graphs with vertices.

The chemical structures represented by alkane trees with a carbon backbone have been generated with Molgen [39]. In particular, we generated the classes ; their cardinalities are , , , and .

Then for both classes (see Definitions 1 and 2), the structure information has been converted into the graphNEL format to calculate the descriptors in R [40] by employing the QuACN package [41]. This package contains R functions of over a hundred topological descriptors.

Numerical Results and Interpretation

In this section, we present the numerical results when evaluating the discriminative power of the information indices, Balaban index and other topological descriptors. Results on exhaustively generated graphs are summarized in Tables 2 and 3, while those on alkane trees are given in Table 5. In total, we evaluated the discriminative power of 27 graph measures.

Download:

Table 2.

,

and

are exhaustive sets of non-isomorphic and connected graphs.

,

and

.

https://doi.org/10.1371/journal.pone.0031214.t002

Download:

Table 3. Exhaustive sets of non-isomorphic graphs.

,

.

https://doi.org/10.1371/journal.pone.0031214.t003

Evaluation of the Discriminative Power Using Exhaustively Generated Graphs.

To interpret the numerical results, we start by considering Table 3 and observe that the sensitivity values due to Konstantinova [12], , for Balaban decreases with increasing number of vertices; see also the ‘Statistical analysis’ section. Throughout this paper, ndv (non-distinguishable values) stands for the number of non-isomorphic graphs whose values cannot be distinguished by a particular index [12]. For example, by considering the class , 61.6623% of the graphs could be distinguished (i.e., have unique values) by the Balaban index. For , only 20.5633% out of almost 12 million exhaustively generated non-isomorphic graphs could be distinguished by . But we can see in Table 3 that the information indices using the information functional approach [4], [15], [21] sketched in the ‘Information indices’ section can discriminate our graphs comparatively well. In particular, , with an exponential weighting scheme(21)denoted by , discriminates 94.8005% out of almost 12 million exhaustively generated graphs successfully. In view of the large number and complexity of the graphs (see , and ), the uniqueness of is striking. Observe that, for all weighting schemes [15], i.e., lin, quad, and exp, is much less discriminative. We realize that the underlying information functional is crucial for reaching uniqueness of the information index. Also, we can clearly see that the uniqueness of other indices shown in Table 3 is quite low. We see that the Balaban and indices are among the best out of the set of known measures that we have chosen to perform this study.

Interestingly, the situation is somewhat the opposite when considering Table 2. Namely, for and , the discriminative power of the Balaban index is higher than by using some of the information measures based on the information functional approach (e.g., and ). Also, we see that the underlying weighting scheme for the coefficients matters a lot, because has a higher discriminative power than the Balaban index for and . In summary, we hypothesize that the Balaban index performs well if the cardinality of the underlying graph set and the order of the involved graphs is rather small. By using a statistical approach, we will verify this hypothesis in the ‘Statistical analysis’ section. Let us give another example to shed light on the degeneracy of the measures when applying them to graphs , see Figure 1 and Table 4. Figure 1 shows four sample graphs where and are structurally quite similar in the following sense. If we remove the edge in and the edge in , the resulting graphs are isomorphic. From Table 4, we see that these graphs can only be fully distinguished by the degree-degree association index. Evaluating the Balaban index on these graphs gives two degenerate graphs namely and . In contrast to this, due to Konstantinova can not discriminate and . Finally, we observe that can not discriminate any of the four example graphs. This implies that every measure captures structural information differently and, hence, its discriminative power can differ dramatically because of

the underlying paradigm to define a graph measure, e.g., information-theoretic vs. non-information-theoretic indices or partition-based vs. non-partition-based
the underlying graph invariant to define a measure, e.g., degrees or distances or several graph invariants etc.

Download:

Figure 1. Four example graphs

.

https://doi.org/10.1371/journal.pone.0031214.g001

Download:

Table 4. Index values for the four example graphs depicted in Figure 1.

https://doi.org/10.1371/journal.pone.0031214.t004

A comparison of the measures with others (e.g., see Table 3) is critical, as the measures rely on different concepts (e.g., information-theoretic vs.non-information-theoretic indices). In the following, we give plausible reasons why the measures using the information functional approach often capture structural information of exhaustively generated graphs more uniquely and significantly than other information measures for graphs that are based on determining partitions of graph invariants. This can also be underpinned by the numerical results; see Tables 2 and 3. Examples of the latter measures are the magnitude-based information indices and due to Bonchev et al. [8], the degree information index [1] and the topological information content of a graph [31], [42].

To construct classical partition-based measures of a graph , we start with a graph invariant and induce a partitioning according to an equivalence criterion. This results in the equivalence classes being obtained. The mean entropy is then given by(22)The process of inducing the partitionings might be the reason for obtaining non-unique indices, as many structurally different graphs could possess the same or similar partitionings when using a certain equivalence criterion, e.g., vertex degree equality [1] or topologically equivalent vertices [31], [42].

In order to derive information measures using the information functional approach, we assign a probability value (see equation 1) to each individual vertex in a graph by using a certain information functional capturing its structural information. Examples thereof are equations 7 and 18. That means the information measures given by equations 4 and 5 can be understood as a cumulation of local quantities representing the vertex probabilities. Clearly, each such quantity captures a certain percentage rate of the structure of . As the numerical results show, these measures conserve structural information more properly than the partition-based ones and result in highly discriminating measures for several graph classes. Note that other classical descriptors (see Tables 2 and 3), such as the Harary index, Randi index [43], [44] and the complexity index etc., rely on the simple derivation of structural quantities (e.g., distances or degrees) to obtain a single numerical value characterizing the complexity the graph. Consequently, their discriminative power is very low; see Tables 2 and 3.

When evaluating the uniqueness (see ndv or values) of and (see Table 3), we observe that the difference between the resulting values is tremendous. Note that the graphs of , and contain cycles. A plausible reason for this is given in Figure 2.

Download:

Figure 2. Left: A cyclic graph and its values of

for each vertex. Right: Values of for each vertex for the same graph.

https://doi.org/10.1371/journal.pone.0031214.g002

We see on the left-hand side that the -sphere cardinalities are rather small if goes to and, hence, their contribution to the value of the particular functional for is small too. Also, there is not much variation between the -sphere cardinalities. This could be a reason that the resulting probability valuesare quite similar to each other and, thus, this has a direct influence on the resulting value of the information index and on its uniqueness. In contrast, the right-hand side of Figure 2 shows that the values of are more diverse and, in particular, those values when goes to are larger than the -sphere cardinalities. This might be a plausible reason why the corresponding vertex probability values are more different and, hence, the resulting entropies as well. As Tables 2 and 3 show, we again emphasize that the discriminative power of an index clearly depends on the underlying graph class.

Evaluation of the Discriminative Power by Using Chemical Graphs.

Here we evaluate the uniqueness of the Balaban index, the information measures using the information functional approach, and the remaining topological descriptors shown in Table 1 by also using chemical graphs. Table 5 depicts the numerical results when applying the measures to chemical alkane trees representing the skeletal graphs. The number of vertices ranges from to . We see again that the discriminative power of the Balaban index decreases when the number of graphs and vertices increase. The Balaban-like indices possess high discriminative power for all four graph classes. Also, we observe that the sum of the local vertex entropies () due to Konstantinova [13], [45] has high uniqueness. Interestingly, it is as good as and . It can be easily shown that, for trees, the information indices using and have equal discriminative power. In particular, , and the just mentioned indices clearly outperform the Balaban index by using the chemical alkane trees.

Download:

Table 5. Chemical alkane trees

with

.

,

.

https://doi.org/10.1371/journal.pone.0031214.t005

Finally, the numerical results show again that the discriminative power of a structural index strongly depends on the underlying graph class. See, for instance, the results when comparing the uniqueness of for the alkane trees and exhaustively generated graphs (see Table 3).

Descriptive Statistical Analysis.

In order to provide further evidence for stability of the uniqueness of by using exhaustively generated graphs, we perform a statistical analysis by using boxplots. The graph class to perform the study is . It is clear that, for computational reasons, the statistical analysis cannot be performed by using the entire set . Hence, we choose subsets of whose sizes are called sample sizes. Also, we perform the boxplot analysis for Balaban as well, and present the resulting plots to investigate the dependence between uniqueness and sample size; see Figure 3. Concretely, 100 samples of 1100, 3300, 11 000, 33 000, 100 000, and 333 000 randomly chosen graphs out of have been analyzed by standard R boxplot routines. That means the medians have been calculated and plotted, with the first and third quantiles as hinges. The whiskers represent the calculated borders of the 95% confidence interval.

Download:

Figure 3. Boxplots to investigate the dependency of the uniqueness of Balaban

and from the sample size by using exhaustively generated graphs with ten vertices.

https://doi.org/10.1371/journal.pone.0031214.g003

As we can see in Figure 3 the uniqueness values are not dispersed for a given sample size, but they depend on the sample size. Further, we observe that the uniqueness of the Balaban index is not stable when the sample size is varied. In general, we call a measure unstable if there is a strong dependency between the uniqueness of and the sample size to perform the statistical analysis. In contrast, is stable if there is only a very little dependency between the uniqueness of and the sample size.

We see from the boxplot that the uniqueness decreases if the sample size increases. Based on our intuition, it seems reasonable that, the smaller the sample size, the better is the discriminative power of the measure under consideration. Thus possesses a non-trivial property, namely a very high discriminative power for exhaustively generated graphs that is almost independent of sample size. By using the above stated definition, we see that is stable on as the uniqueness is constantly high and does not depend much on the sample size. We see from Table 3 that is the only topological descriptor possessing this property. Other topological measures, and particularly the Balaban index, have the trivializing property that, for exhaustively generated graphs, the uniqueness is only reasonable for small sets of graphs.

Hence some of the entropy measures using the information functional approach could be applied successfully for discriminating sets of large complex networks as well. Keep in mind that in fact such classes of exhaustively generated complex networks possess huge cardinalities. Note that the cardinality of the exhaustively generated non-isomorphic graphs with 10 vertices is already greater than 11 million. As we conclude from this statistical analysis, possesses the stability property that is necessary to achieve feasible results when applied to sets of large complex networks.

Summary and Conclusion

In this paper, we have dealt with the problem of evaluating the discriminative power of topological graph measures by using exhaustively generated, non-isomorphic graphs without vertex and edge weights. We have made an attempt to translate topological indices into the field of complex networks when evaluating their uniqueness. We found that one of the information measures for graphs using the information functional based on degree–degree associations outperformed the Balaban index tremendously. Also, by using the graph class , we found that the uniqueness of the Balaban index is quite sensitive to varying sample size when performing the statistical analysis; see ‘Statistical analysis’ section. In particular, the uniqueness of the Balaban index deteriorated when increasing the sample size. This makes Balaban in particular non-feasible for discriminating complex networks structurally as they are multicyclic, do not have structural constraints, and the cardinality of an underlying set of such networks is huge. This property was also observed by using other topological indices shown in Table 1. The numerical results when using exhaustively generated graphs and alkane trees can be found in Tables 2, 3, and 5.

Altogether, this study clearly shows the limitations of topological indices and restrictions when applying them on a large scale. A topological index can be unique for a particular graph class but it fails when applying the measure to another class. In this sense, it is far from trivial that we obtained an index (see the definition of ) that turned out to be highly discriminating for exhaustively generated graph classes. Note that the underlying graphs do not possess structural constraints.

As to future work, we will evaluate further topological indices on a large scale to obtain deeper theoretical insights. From such an analysis, one can also learn how the measures capture structural information. This relates to better understanding of their structural interpretation. We are convinced that these developments could also trigger future developments positively when developing and investigating topological graph measures in the context of complex networks.

Author Contributions

Analyzed the data: MD MG KV. Wrote the paper: MD MG KV.

References

1. Bonchev D (1983) Information Theoretic Indices for Characterization of Chemical Structures. Research Studies Press, Chichester.
2. Bonchev D, Rouvray DH (2005) Complexity in Chemistry, Biology, and Ecology. Mathematical and Computational Chemistry. Springer. New York, NY, USA.
3. da F Costa L, Rodrigues F, Travieso G (2007) Characterization of complex networks: A survey of measurements. Advances in Physics 56: 167–242.
- View Article
- Google Scholar
4. Dehmer M, Mowshowitz A (2011) A history of graph entropy measures. Information Sciences 1: 57–78.
- View Article
- Google Scholar
5. Emmert-Streib F, Dehmer M (2007) Information theoretic measures of UHG graphs with low computational complexity. Applied Mathematics and Computation 190: 1783–1794.
- View Article
- Google Scholar
6. Mehler A, Weiß P, Lücking A (2010) A network model of interpersonal alignment. Entropy 12: 1440–1483.
- View Article
- Google Scholar
7. Todeschini R, Consonni V, Mannhold R (2002) Handbook of Molecular Descriptors. Wiley-VCH. Weinheim, Germany.
8. Bonchev D, Trinajstić N (1977) Information theory, distance matrix and molecular branching. J Chem Phys 67: 4517–4533.
- View Article
- Google Scholar
9. Bonchev D, Mekenyan O, Trinajstić N (1981) Isomer discrimination by topological information approach. J Comp Chem 2: 127–148.
- View Article
- Google Scholar
10. Trinajstić N (1992) Chemical Graph Theory. CRC Press. Boca Raton, FL, USA.
11. Raychaudhury C, Ray SK, Ghosh JJ, Roy AB, Basak SC (1984) Discrimination of isomeric structures using information theoretic topological indices. Journal of Computational Chemistry 5: 581–588.
- View Article
- Google Scholar
12. Konstantinova EV (1996) The discrimination ability of some topological and information distance indices for graphs of unbranched hexagonal systems. J Chem Inf Comput Sci 36: 54–57.
- View Article
- Google Scholar
13. Konstantinova EV, Paleev AA (1990) Sensitivity of topological indices of polycyclic graphs. Vychisl Sistemy 136: 38–48.
- View Article
- Google Scholar
14. Diudea MV, Ilić A, Varmuza K, Dehmer M (2011) Network analysis using a novel highly discriminating topological index. Complexity 16: 32–39.
- View Article
- Google Scholar
15. Dehmer M, Varmuza K, Borgert S, Emmert-Streib F (2009) On entropy-based molecular descriptors: Statistical analysis of real and synthetic chemical structures. J Chem InfModel 49: 1655–1663.
- View Article
- Google Scholar
16. Xu CYHL (1996) On highly discriminating molecular topological index. J Chem Inf Comput Sci 36: 82–90.
- View Article
- Google Scholar
17. Kim J, Wilhelm T (2008) What is a complex graph? Physica A 387: 2637–2652.
- View Article
- Google Scholar
18. Dorogovtsev SN, Mendes JFF (2003) Evolution of Networks. From Biological Networks to the Internet and WWW. Oxford University Press.
19. Balaban AT (1982) Highly discriminating distance-based topological index. Chem Phys Lett 89: 399–404.
- View Article
- Google Scholar
20. Vukičević D, Balaban AT (2005) On the degeneracy of topological index J. Internet Electronic Journal of Molecular Design 4: 491–500.
- View Article
- Google Scholar
21. Dehmer M (2008) Information processing in complex networks: Graph entropy and information functionals. Appl Math Comput 201: 82–94.
- View Article
- Google Scholar
22. Dehmer M, Barbarini N, Varmuza K, Graber A (2009) A large scale analysis of informationtheoretic network complexity measures using chemical structures. PLoS ONE 4: e8057.
- View Article
- Google Scholar
23. Li X, Gutman I (2006) Mathematical Aspects of Randić-Type Molecular Structure Descriptors. Mathematical Chemistry Monographs. University of Kragujevac and Faculty of Science Kragujevac.
24. Zhou B (2008) Bounds on the balaban index. Croatica Chemica Acta 81: 319–323.
- View Article
- Google Scholar
25. Dehmer M, Borgert S, Emmert-Streib F (2008) Entropy bounds for molecular hierarchical networks. PLoS ONE 3: e3079.
- View Article
- Google Scholar
26. Dehmer M, Borgert S, Bonchev D (2008) Information inequalities for graphs. Symmetry: Culture and Science Symmetry in Nanostructures (Special issue edited by M Diudea) 19: 269–284.
- View Article
- Google Scholar
27. Devillers J, Balaban AT (1999) Topological Indices and Related Descriptors in QSAR and QSPR. Gordon and Breach Science Publishers. Amsterdam, The Netherlands.
28. Dehmer M, Barbarini N, Varmuza K, Graber A (2010) Novel topological descriptors for analyzing biological networks. BMC Structural Biology 10:
- View Article
- Google Scholar
29. Skorobogatov VA, Dobrynin AA (1988) Metrical analysis of graphs. Commun Math Comp Chem 23: 105–155.
- View Article
- Google Scholar
30. Bonchev D (2009) Information theoretic measures of complexity. In: Meyers R, editor. pp. 4820–4838. Encyclopedia of Complexity and System Science, Springer, volume 5.
31. Mowshowitz A (1968) Entropy and the complexity of the graphs I: An index of the relative complexity of a graph. Bull Math Biophys 30: 175–204.
- View Article
- Google Scholar
32. Shannon CE, Weaver W (1949) The Mathematical Theory of Communication. University of Illinois Press.
33. Dijkstra EW (1959) A note on two problems in connection with graphs. Numerische Math 1: 269–271.
- View Article
- Google Scholar
34. Dehmer M, Emmert-Streib F, Tsoy Y, Varmuza K (2011) Quantifying structural complexity of graphs: Information measures in mathematical chemistry. In: Putz M, editor. pp. 479–498. Quantum Frontiers of Atoms and Molecules, Nova Publishing.
35. Harary F (1969) Graph Theory. Addison Wesley Publishing Company. Reading, MA, USA.
36. Balaban AT, Balaban TS (1991) New vertex invariants and topological indices of chemical graphs based on information on distances. J Math Chem 8: 383–397.
- View Article
- Google Scholar
37. McKay BD (2010) Nauty. http://cs.anu.edu.au/_bdm/nauty/.
38. McKay BD (1998) Isomorph-free exhaustive generation. Journal of Algorithms 26: 306–324.
- View Article
- Google Scholar
39. (2000) Molgen isomer generator software. www.molgen.de. Institute of Mathematics II, University of Bayreuth, Germany.
40. (2011) R, software, a language and environment for statistical computing. www.r-project.org. R Development Core Team, Foundation for Statistical Computing, Vienna, Austria.
41. Müller LAJ, Kugler KG, Dander A, Graber A, Dehmer M (2010) QuACN - an R package for analyzing complex biological networks quantitatively. Bioinformatics 140–141.
- View Article
- Google Scholar
42. Rashevsky N (1955) Life, information theory, and topology. Bull Math Biophys 17: 229–235.
- View Article
- Google Scholar
43. Randić M (1975) On characterization of molecular branching. J Amer Chem Soc 97: 6609–6615.
- View Article
- Google Scholar
44. Wiener H (1947) Structural determination of paraffin boiling points. Journal of the American Chemical Society 69: 17–20.
- View Article
- Google Scholar
45. Konstantinova EV, Skorobogatov VA, Vidyuk MV (2002) Applications of information theory in chemical graph theory. Indian Journal of Chemistry 42: 1227–1240.
- View Article
- Google Scholar
46. Bertz SH (1981) The first general index of molecular complexity. Journal of the American Chemical Society 103: 3241–3243.
- View Article
- Google Scholar

[ref1] 1. Bonchev D (1983) Information Theoretic Indices for Characterization of Chemical Structures. Research Studies Press, Chichester.

[ref2] 2. Bonchev D, Rouvray DH (2005) Complexity in Chemistry, Biology, and Ecology. Mathematical and Computational Chemistry. Springer. New York, NY, USA.

[ref3] 3. da F Costa L, Rodrigues F, Travieso G (2007) Characterization of complex networks: A survey of measurements. Advances in Physics 56: 167–242.
View Article
Google Scholar

[4] View Article

[5] Google Scholar

[ref4] 4. Dehmer M, Mowshowitz A (2011) A history of graph entropy measures. Information Sciences 1: 57–78.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. Emmert-Streib F, Dehmer M (2007) Information theoretic measures of UHG graphs with low computational complexity. Applied Mathematics and Computation 190: 1783–1794.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Mehler A, Weiß P, Lücking A (2010) A network model of interpersonal alignment. Entropy 12: 1440–1483.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Todeschini R, Consonni V, Mannhold R (2002) Handbook of Molecular Descriptors. Wiley-VCH. Weinheim, Germany.

[ref8] 8. Bonchev D, Trinajstić N (1977) Information theory, distance matrix and molecular branching. J Chem Phys 67: 4517–4533.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref9] 9. Bonchev D, Mekenyan O, Trinajstić N (1981) Isomer discrimination by topological information approach. J Comp Chem 2: 127–148.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref10] 10. Trinajstić N (1992) Chemical Graph Theory. CRC Press. Boca Raton, FL, USA.

[ref11] 11. Raychaudhury C, Ray SK, Ghosh JJ, Roy AB, Basak SC (1984) Discrimination of isomeric structures using information theoretic topological indices. Journal of Computational Chemistry 5: 581–588.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref12] 12. Konstantinova EV (1996) The discrimination ability of some topological and information distance indices for graphs of unbranched hexagonal systems. J Chem Inf Comput Sci 36: 54–57.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref13] 13. Konstantinova EV, Paleev AA (1990) Sensitivity of topological indices of polycyclic graphs. Vychisl Sistemy 136: 38–48.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref14] 14. Diudea MV, Ilić A, Varmuza K, Dehmer M (2011) Network analysis using a novel highly discriminating topological index. Complexity 16: 32–39.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref15] 15. Dehmer M, Varmuza K, Borgert S, Emmert-Streib F (2009) On entropy-based molecular descriptors: Statistical analysis of real and synthetic chemical structures. J Chem InfModel 49: 1655–1663.
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref16] 16. Xu CYHL (1996) On highly discriminating molecular topological index. J Chem Inf Comput Sci 36: 82–90.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref17] 17. Kim J, Wilhelm T (2008) What is a complex graph? Physica A 387: 2637–2652.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref18] 18. Dorogovtsev SN, Mendes JFF (2003) Evolution of Networks. From Biological Networks to the Internet and WWW. Oxford University Press.

[ref19] 19. Balaban AT (1982) Highly discriminating distance-based topological index. Chem Phys Lett 89: 399–404.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref20] 20. Vukičević D, Balaban AT (2005) On the degeneracy of topological index J. Internet Electronic Journal of Molecular Design 4: 491–500.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref21] 21. Dehmer M (2008) Information processing in complex networks: Graph entropy and information functionals. Appl Math Comput 201: 82–94.
View Article
Google Scholar

[52] View Article

[53] Google Scholar

[ref22] 22. Dehmer M, Barbarini N, Varmuza K, Graber A (2009) A large scale analysis of informationtheoretic network complexity measures using chemical structures. PLoS ONE 4: e8057.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref23] 23. Li X, Gutman I (2006) Mathematical Aspects of Randić-Type Molecular Structure Descriptors. Mathematical Chemistry Monographs. University of Kragujevac and Faculty of Science Kragujevac.

[ref24] 24. Zhou B (2008) Bounds on the balaban index. Croatica Chemica Acta 81: 319–323.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref25] 25. Dehmer M, Borgert S, Emmert-Streib F (2008) Entropy bounds for molecular hierarchical networks. PLoS ONE 3: e3079.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref26] 26. Dehmer M, Borgert S, Bonchev D (2008) Information inequalities for graphs. Symmetry: Culture and Science Symmetry in Nanostructures (Special issue edited by M Diudea) 19: 269–284.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref27] 27. Devillers J, Balaban AT (1999) Topological Indices and Related Descriptors in QSAR and QSPR. Gordon and Breach Science Publishers. Amsterdam, The Netherlands.

[ref28] 28. Dehmer M, Barbarini N, Varmuza K, Graber A (2010) Novel topological descriptors for analyzing biological networks. BMC Structural Biology 10:
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref29] 29. Skorobogatov VA, Dobrynin AA (1988) Metrical analysis of graphs. Commun Math Comp Chem 23: 105–155.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref30] 30. Bonchev D (2009) Information theoretic measures of complexity. In: Meyers R, editor. pp. 4820–4838. Encyclopedia of Complexity and System Science, Springer, volume 5.

[ref31] 31. Mowshowitz A (1968) Entropy and the complexity of the graphs I: An index of the relative complexity of a graph. Bull Math Biophys 30: 175–204.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref32] 32. Shannon CE, Weaver W (1949) The Mathematical Theory of Communication. University of Illinois Press.

[ref33] 33. Dijkstra EW (1959) A note on two problems in connection with graphs. Numerische Math 1: 269–271.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref34] 34. Dehmer M, Emmert-Streib F, Tsoy Y, Varmuza K (2011) Quantifying structural complexity of graphs: Information measures in mathematical chemistry. In: Putz M, editor. pp. 479–498. Quantum Frontiers of Atoms and Molecules, Nova Publishing.

[ref35] 35. Harary F (1969) Graph Theory. Addison Wesley Publishing Company. Reading, MA, USA.

[ref36] 36. Balaban AT, Balaban TS (1991) New vertex invariants and topological indices of chemical graphs based on information on distances. J Math Chem 8: 383–397.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref37] 37. McKay BD (2010) Nauty. http://cs.anu.edu.au/_bdm/nauty/.

[ref38] 38. McKay BD (1998) Isomorph-free exhaustive generation. Journal of Algorithms 26: 306–324.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref39] 39. (2000) Molgen isomer generator software. www.molgen.de. Institute of Mathematics II, University of Bayreuth, Germany.

[ref40] 40. (2011) R, software, a language and environment for statistical computing. www.r-project.org. R Development Core Team, Foundation for Statistical Computing, Vienna, Austria.

[ref41] 41. Müller LAJ, Kugler KG, Dander A, Graber A, Dehmer M (2010) QuACN - an R package for analyzing complex biological networks quantitatively. Bioinformatics 140–141.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref42] 42. Rashevsky N (1955) Life, information theory, and topology. Bull Math Biophys 17: 229–235.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref43] 43. Randić M (1975) On characterization of molecular branching. J Amer Chem Soc 97: 6609–6615.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref44] 44. Wiener H (1947) Structural determination of paraffin boiling points. Journal of the American Chemical Society 69: 17–20.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref45] 45. Konstantinova EV, Skorobogatov VA, Vidyuk MV (2002) Applications of information theory in chemical graph theory. Indian Journal of Chemistry 42: 1227–1240.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref46] 46. Bertz SH (1981) The first general index of molecular complexity. Journal of the American Chemical Society 103: 3241–3243.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

Figures

Abstract

Introduction

Methods

Topological Descriptors

Information Indices.

Distance-Based Topological Descriptors.

Results

Data and Software

Numerical Results and Interpretation

Evaluation of the Discriminative Power Using Exhaustively Generated Graphs.

Evaluation of the Discriminative Power by Using Chemical Graphs.

Descriptive Statistical Analysis.

Summary and Conclusion

Author Contributions

References