Conceived and designed the experiments: YNK DYK. Performed the experiments: YNK DYK. Analyzed the data: YNK DYK. Contributed reagents/materials/analysis tools: DYK. Wrote the paper: YNK DYK EB-J MF.
Eshel Ben-Jacob is a PLoS ONE Academic Editor. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.
Semantic memory has generated much research. As such, the majority of investigations have focused on the English language, and much less on other languages, such as Hebrew. Furthermore, little research has been done on search processes within the semantic network, even though they are abundant within cognitive semantic phenomena.
We examine a unique dataset of free association norms to a set of target words and make use of correlation and network theory methodologies to investigate the global and local features of the Hebrew lexicon. The global features of the lexicon are investigated through the use of association correlations – correlations between target words, based on their association responses similarity; the local features of the lexicon are investigated through the use of association dependencies – the influence words have in the network on other words.
Our investigation uncovered Small-World Network features of the Hebrew lexicon, specifically a high clustering coefficient and a scale-free distribution, and provides means to examine how words group together into semantically related ‘free categories’. Our novel approach enables us to identify how words facilitate or inhibit the spread of activation within the network, and how these words influence each other. We discuss how these properties relate to classical research on spreading activation and suggest that these properties influence cognitive semantic search processes. A semantic search task, the Remote Association Test is discussed in light of our findings.
Search processes, both conscious and unconscious, are abundant within the cognitive system, across all domains. To note just a few examples – whenever we need to apply various semantic memory tasks, we constantly invoke search processes within the mental lexicon
The classical models of semantic memory, developed in the 1970's
The spreading activation model for semantic memory presented by Collins and Loftus
As a result of challenges to the Collins and Quillian model
Since the introduction of these two classic frameworks for semantic memory in the 1970's, and the extensive research based on them, other computational models of semantic memory have been suggested for semantic memory. A few examples of such computational models are the Latent Semantic Analysis (LSA) and the Hyperspace Analogue to Language (HAL) models, both extract semantic relatedness through the analysis of co-occurances of words within corpora of texts (for an extensive review, see
In recent years, the Small World Network (SWN) has gained a lot of attention with regard to its description of complex networks. This model
The third main characteristic of small-world networks is its degree distribution [P(k)] – the distribution of amount of edges (k) per node in the network. This characteristic is significant due to the fact that complex systems do not abide to the Gaussian (normal) distribution, and rather present scaling law distributions (such as exponential, or power-law)
In the past few years, the application of the SWN model within neuroscience research has been growing rapidly
Studying the semantic lexicon with the use of complex network methodology, based on association networks, poses great merit. This is due to the general agreement, from a psychological point of view, that associations are one of the organizing principles of semantic memory
The above mentioned SWN characteristics of the semantic lexicon have previously been investigated in English, Dutch, German and Spanish
Furthermore, in the present research we employ novel network methodologies to explore global and local features of semantic networks which influence search processes within the semantic network. This was achieved by analyzing a unique dataset of free associations in Hebrew, examining for the first time the characteristics of the Hebrew semantic lexicon. We begin by examining its global network features and by charting the networks' topology. Next we investigate the local features of the network, a process which allows us to observe causal relations between the nodes of the network. We conclude our research by proposing that the global and local characteristics of the network entail cognitive semantic search processes and illustrate our proposal with the Remote Association Test
The data analyzed in this study consists of free association norms in Hebrew gathered by
In total, the subjects were presented with 800 different target words, in four separate sessions (200 target words in each session; see
Histogram of the number of association responses to target words.
In order to analyze the dataset, we first standardized the data into a matrix, in which every column is a different target word and every row is a different association response to a target word, deriving a 123664×800 matrix. Since many similar association responses were received for different target words and due to various typing errors within the data, we proceeded to a preprocessing phase in order to construct a matrix where each row was a unique singular association response. This preprocessing stage entailed two actions – standardizing association responses (i.e. neighbour→neighbor; 3.5% of all responses) and converting plural into singular (i.e. fruits→fruit; 13.5% of all responses). Next, all standardized association responses were organized into a single matrix (123664 association responses by 800 target words) and identical association responses were merged using the Minitab software (
First, we computed the association correlation matrix from the association data. The correlations between the target word associations profiles (the associations of the target words given by all subjects), were calculated by Pearson's formula:
The association correlation matrix can be studied in terms of an adjacency matrix of a weighted, undirected network. In this view, each target word is a node in the network, and an edge (link) between two nodes (words) is the correlation between these two nodes, with the correlation value being the weight of that link. Thus, the association correlation matrix represents a fully connected weighted network in which the nodes represent the target words, and the links represent the correlations between these words.
The complete association correlation network for
To construct the planar maximally filtered graph (PMFG) we first order the
The network parameters calculated were mainly performed with the Brain Connectivity Toolbox for Matlab
Constructing the association correlation network enables studying its topological properties. First, we made use of Newman's modularity measure
The semantic network representation allows searching for words that have a significant importance in the semantic lexicon. In network theory, the importance of each node in a given network is quantified using different measures, such as the betweeness measure and eigenvalue centrality
The dependency network approach provides a new analysis of the activity and topology of directed networks. The approach extracts causal topological relations between the network's nodes, and provides an important step towards inference of causal activity relations between the network nodes.
In the case of network activity, the analysis is based on partial correlations, which are increasingly used to investigate complex systems (i.e.
The first order partial correlation coefficient is a statistical measure indicating how a third variable affects the correlation between two other variables
The relative effect of the correlations
Next, we define the total influence of node
Note that the association correlation network and the association dependency network target different levels of analysis of the Hebrew lexicon. The association correlation network presents the similarity of target words, according to the association responses provided by the subjects. The association dependency network provides local information on the interaction between words; this network reflects how one word affects the correlations of all other target words. Thus, for example, the nodes dough (‘batzek’) and flour (‘kemach’) have a strong similarity in the association responses given to both words, and thus are connected to each other in the association correlation network (global level). However, the node dough (‘batzek’) does not have a strong influence on the correlations of the node flour (‘kemach’) with all other nodes, and thus these two nodes will not be connected in the association dependency network (local level). The association correlation network provides the global information of the semantic lexicon, whereas the association dependency network provides the local (and potentially causal) information of the semantic lexicon.
We begin by calculating the association correlation matrix. Next, we use the dendrogram hierarchal clustering process
In
The dendrogram hierarchal clustering method is used to find cliques of words with a strong semantic similarity (left panel), and then to order the normalized association correlation matrix (right panel).
Next, we construct the association semantic network from the association correlation matrix, using the PMFG filtering process (see above). We then calculate different SWN properties of the semantic network. The values of the different SWN parameters calculated are summarized in
Parameter | Value |
|
800 |
|
10.0349 |
|
25 |
|
0.6831 |
|
5.9425 |
γ | 3.5 |
|
0.0054 |
|
3.9450 |
|
34.3728 |
|
0.5647 |
|
56 |
These results clearly show the SWN characteristics of the Hebrew association correlation network. The clustering coefficient is much higher than that of the random graph (CC = 0.6831 > CCrand = 0.0054). The small-world-ness measure clearly signifies a SWN (S = 34.37), which was also statistically tested and found significant (see
Examining the degree distribution clearly reveals a non-Gaussian distribution, with a scale-free
Plot of degree distribution of target words in the correlation network, in a log-log scale.
As can be seen in
To visualize the network we plotted the graph using Cytoscape
Representation of the entire network of 800 words, as they are grouped together in the planar graph, constructed from the association correlations. Each word is a node in the network (green circle), and a link between two words represents their association correlation (blue line).
The clique shown in
An example of a clique from the full network, semantically concentrated on the notion of making bread.
A second example of cliques within the network is that of three cliques connected to each other in the full network (
An example of three cliques from the full network, semantically concentrated on foot, sky and hiking. The three cliques are related in their semantic focus, with the left centered on the notion of feet, and the right bottom centered on the notion of the sky, and the top right centered on the notion of hiking.
These three cliques are connected to each other via two ‘gateway nodes’ (
The cliques presented in
Finally, we investigated the impact of a given word
The impact of a given word
As described above, the path length of the network represents the relations between the nodes in the network, and more specifically directly relates to association strength which is a determining factor in the spread of activation
A positive impact score signifies that after the deletion of word
FH | Impact | IH | impact2 |
Saad (to nurse) | 1.969611 | Zricha (sunrise) | −0.57214 |
Heechil (fed) | 1.759909 | Mevushal (cooked) | −0.48049 |
Nedava(donation) | 1.628782 | Kurkum(turmeric) | −0.45816 |
Sinor (apron) | 1.220627 | Itria (noodle) | −0.45045 |
Kruvit (cauliflower) | 1.022254 | Kamun (cumin) | −0.44953 |
Aruga (flowerbed) | 1.01617 | Bishel (to cook) | −0.42135 |
Kabtzan (beggar) | 0.965875 | Histabech (got in trouble) | −0.40847 |
Asuphi (waif) | 0.894084 | Poshea (criminal) | −0.34226 |
Pashtida (pie) | 0.868739 | Goses (dying) | −0.33363 |
Salat (salad) | 0.866699 | Munsham (being ventilated) | −0.33363 |
Neft (oil) | 0.768617 | Chol (sand) | −0.26792 |
Orev (crow) | 0.464451 | Tipel (treated) | −0.24959 |
Ataleph (bat) | 0.44322 | Hanaa (enjoyment) | −0.18504 |
Atzitz (flowerpot) | 0.425491 | Hanaka (breast-feeding) | −0.17735 |
Hityatem (to be orphaned) | 0.350795 | Arisa (cradle) | −0.1773 |
Benzin (gasoline) | 0.345807 | ||
Givol (stem) | 0.306829 | ||
Seara (storm) | 0.293871 | ||
Izdarechet (margosa tree) | 0.269877 | ||
Miphrasit (sailboat) | 0.269191 | ||
Mechonit (car) | 0.260143 | ||
Dolar (dollar) | 0.212108 |
The words are ordered in descending order of their impact strength.
While the importance of FH and IH demands further research, it is interesting to note the FH ‘pashtida’ (pie; impact 0.868739) and the IH ‘mevushal’ (cooked; impact −0.48049). Both connect the clique of bread making to the rest of the network (
We constructed the association dependency network from the association correlation matrix, by calculating the partial correlations and then using the PMFG filtering process (see above) to extract the association dependency network, resulting in an 800×800 binary directed network. To inspect the association dependency network topology we plotted the network using Cytoscape
A 2D visualization of the full association dependency network (left panel), and an example of a dependency clique in the network, showing association dependencies and related to the notion of making bread (right panel).
Exploring the topology of the network reveals a highly modular topology. Calculating the modularity measure
One such influence clique is presented in
On this network we calculated for every node its outDegree, which signifies the influence score of each node (i.e. how many nodes are affected by node
OutDegree (left panel) and inDegree distribution (right panel) of node dependency. The outDegree refers to how many nodes are influenced by node
In order to examine the differences between the outDegree and inDegree distribution, we analyzed the nodes Relative Influence score, which provides a more objective significance of a node
Percentage of different types of nodes, based on their relative influence score – influence nodes are nodes who have an outDegree > 1 and inDegree = 0; receiver nodes are nodes who have an outDegree = 0 and inDegree > 1; zero nodes are nodes who have an outDegree = inDegree; negative nodes are nodes who have an outDegree < inDegree; and positive nodes are nodes who have an outDegree > inDegree.
It should be noted that while only 4% of the nodes act as influence nodes in the network, nearly 30% of the nodes act as receiver nodes in the network, and putting the zero nodes aside, there is a 29.5% positive (influence effect) - 52.325% negative (receiver effect) division of the network. This shows that the network influence dynamics is governed by a relatively small number of influence (full or partial) nodes and a larger number of receiver (full or partial) nodes.
table-1-captionWhile the role of the 35 influence nodes is unclear and constitutes only 4% of the entire network, all of these nodes have strong outDegree scores in the network, suggesting that these nodes act as influence hubs in the network. Among the top 10 nodes with the highest outDegree scores (most influential nodes in the network), 60% are such influence nodes.
X axis represents the nodes and Y axis represents the outDegree score. Highlighted in orange are nodes which are influence nodes, as described above.
Finally, we compared the results of the association dependency network analysis and that of the association correlation network analysis, by examining the relationship between the Facilitative (Inhibitive) Hubs impact score and their Relative Influence score. While there were only weak correlation coefficients between the RI and the impact score of the Facilitative (Inhibitive) Hubs (
Here we present a novel approach for studying the global and local features of semantic networks, and apply our approach to examine the Hebrew mental lexicon. The similarities between words based on their free association responses were calculated and used to construct the association correlation matrix. These association correlations were then used to analyze the Hebrew lexicon from a global and local perspective. From the global perspective, this was done by constructing a network representing the Hebrew semantic lexicon and by investigating the characteristics and topology of this network. From the local perspective, this was done by constructing a network which represents the influence effect that different nodes (words) in the network have on each other, and by exploring the characteristics of this influence effect. Furthermore, we investigated the relationship between the global and local levels of the network.
The method used in this research is novel in two ways, the use of free associations and our network analysis technique. The free association dataset analyzed differs from previous free association datasets in the amount of associations generated by subjects per target word. As discussed above, we believe this method may offer a better way to explore the mental lexicon structure, and is in accord with Collins and Loftus
From the global point of view of the network, we have shown the SWN nature of the Hebrew mental lexicon. This conclusion joins a growing mass of work on the SWN nature of semantics in different languages
Furthermore, the construction of the network allows us to identify how the target words organize into sub-cliques, based on semantic categories. Thus, this method revealed how words organize themselves into natural or ‘free’ categories. This is illustrated by the example presented in
Finally, our calculation of the impact effect of a given word on the general network enables the identification of words that facilitate and inhibit the spread of activation within the network. This impact effect requires further investigation, but can be experimentally used in semantic memory paradigms, in order to investigate the organization of memory and memory retriebal patterns. Furthermore, it can be implemented in the study of individual differences, including clinical populations (e.g. patients suffering from schizophrenia, Asperger or semantic dementia) as a clinical tool.This clinical aspiration is strengthened by a recent study on Autism, which used complex network analysis to investigate neurophysiological differences between autistic and control subjects
From the local system point of view, our analysis of the association dependency network allowed us to explore the local properties of the interaction of nodes within the lexicon. This analysis revealed a balanced influence dynamics of the network, showing that this balanced dynamics is mainly governed by a small amount of strong influence nodes (that only influence other nodes but are not influenced by any other nodes), and by a relatively large amount of ‘receiver’ nodes (nodes that are only influenced by other nodes but do not influence any nodes). Thus, the dependency network exhibits a “scale-free” charactaristic of dependency distribution. This node dependency information can enrich semantic network growth models
Finally, while the association correlation and dependency networks analyses relate to different and independent levels of the network, we did discover a weak relationship between the two, suggesting that the Facilitative Hubs have a tendency to act as influencing nodes and that the Inhibitive Hubs have a tendency to act more as receiver nodes in the network. These two independent properties of the lexicon (spread of activation and influence strength) are consistent with Lorch's findings, that contradicted the conventional approach that strong associations are activated faster and to a higher level than weak associations
While previous research examined the SWN of several Proto-German languages and mainly in English
In addition to shedding light on the structure of the Hebrew mental lexicon, these global and local features may explain various semantic cognitive search processes through semantic memory
One example of a task entailing a cognitive semantic search is Mednick's Remote Association Test (RAT
We suggest that the network properties of the lexicon described above, combined with the small world theory of insight
In summary, the work presented here adds to a growing mass of work analyzing the SWN nature of the semantic mental lexicon, and is the first such work in the Hebrew language. The method we have used provides a novel way to explore how words organize together and interact with each other within the mental lexicon. We propose that this SWN architecture of the mental lexicon may have significant implications for the understanding of various cognitive semantic search processes, and plan to further explore the results presented here with additional advanced clustering and network methodologies. We will also empirically investigate our results using various semantic paradigms, such as the RAT
While many questions on the nature of semantic memory and its properties remain open, we propose that bridging together cognitive phenomena such as creativity and the empirically proven Small World nature of the English
We thank O. Rubinstein, D. Anaki, A. Henik, S. Drori, and I. Farn for their permission to analyze their data.