## Abstract

Short-term memory in the brain cannot in general be explained the way long-term memory can – as a gradual modification of synaptic weights – since it takes place too quickly. Theories based on some form of cellular bistability, however, do not seem able to account for the fact that noisy neurons can collectively store information in a robust manner. We show how a sufficiently clustered network of simple model neurons can be instantly induced into metastable states capable of retaining information for a short time (a few seconds). The mechanism is robust to different network topologies and kinds of neural model. This could constitute a viable means available to the brain for sensory and/or short-term memory with no need of synaptic learning. Relevant phenomena described by neurobiology and psychology, such as local synchronization of synaptic inputs and power-law statistics of forgetting avalanches, emerge naturally from this mechanism, and we suggest possible experiments to test its viability in more biological settings.

**Citation: **Johnson S, Marro J, Torres JJ (2013) Robust Short-Term Memory without Synaptic Learning. PLoS ONE 8(1):
e50276.
doi:10.1371/journal.pone.0050276

**Editor: **Dante R. Chialvo,
National Research & Technology Council, Argentina

**Received:** May 22, 2012; **Accepted:** October 23, 2012; **Published:** January 22, 2013

**Copyright:** © 2013 Johnson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Funding: **This work was supported by Junta de Andalucía projects FQM-01505 and P09-FQM4682, by the joint Spanish Research Ministry (MEC) and the European Budget for the Regional Development (FEDER) project FIS2009-08451, and by the Granada Research of Excellence Initiative on Bio-Health (GREIB) traslational project GREIB.PT_2011_19 of the Spanish Science and Innovation Ministry (MICINN) “Campus of International Excellence.” S.J. is grateful for financial support from the Oxford Centre for Integrative Systems Biology, and from the European Commission under the Marie Curie Intra-European Fellowship Programme PIEF-GA-2010-276454. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests:** The authors have declared that no competing interests exist.

### Introduction

#### Slow but sure, or fast and fleeting?

Memory – the storage and retrieval of information by the brain – is probably nowadays one of the best understood of all the collective phenomena to emerge in that most complex of systems. Thanks to a gradual modification of synaptic weights (the interaction strengths with which neurons signal to one other) particular patterns of firing and non-firing cells become energetically favourable and so systems evolve towards these attractors according to a mechanism known as Associative Memory [1]–[4]. In nature, these synaptic modifications occur via the biochemical processes of long-term potentiation (LTP) and depression (LTD) [5], [6]. However, some memory processes take place on timescales of seconds or less and in many instances cannot be accounted for by LTP and LTD [7], since these require at least minutes to be effected [8], [9]. For example, visual stimuli are recalled in great detail for up to about one second after exposure (iconic memory); similarly, acoustic information seems to linger for three or four seconds (echoic memory) [10], [11]. In fact, it appears that the brain actually holds and continually updates a kind of buffer in which sensory information regarding its surroundings is maintained (sensory memory) [12]. This is easily observed by simply closing one's eyes and recalling what was last seen, or thinking about a sound after it has finished. Another instance is the capability referred to as *working* memory [7], [13]: just as a computer requires RAM for its calculations despite having a hard drive for long-term storage, the brain must continually store and delete information to perform almost any cognitive task. We shall here use *short-term* memory to describe the brain's ability to store information on a timescale of seconds or less.

Evidence that short-term memory is related to sensory information while long-term memory is more conceptual can be found in psychology. For instance, a sequence of similar sounding letters is more difficult to retain for a short time than one of phonetically distinct ones, while this has no bearing on long-term memory, for which semantics seems to play the main role [14], [15]; and the way many of us think about certain concepts, such as chess, geometry or music, is apparently quite sensorial: we imagine positions, surfaces or notes as they would look or sound. Most theories of short-term memory – which almost always focus on working memory – make use of some form of previously stored information (i.e., of synaptic learning) and so can account for labelling tasks, such as remembering a particular series of digits or a known word, but not for the instant recall of novel information [16]–[18]. (This method can also be used to represent a continuous variable, such as the value of an angle or the length of an object, because concepts such as *angle* and *length* are in some sense already “known” at the time of the stimulus [19].) An interesting exception is the mechanism proposed by Chialvo *et al.* [20] which allows for arbitrary patterns of activity to be temporarily retained thanks to the refractory times of neurons.

Attempts to deal with novel information have been made by proposing mechanisms of *cellular bistability*: neurons are assumed to retain the state they are placed in (such as firing or not firing) for some period of time thereafter [21]–[23]. Although there may indeed be subcellular processes leading to a certain bistability, the main problem with short-term memory depending exclusively on such a mechanism is that if each neuron must act independently of the rest the patterns will not be robust to random fluctuations [7] – and the behaviour of individual neurons is known to be quite noisy [24]. It is worth pointing out that one of the strengths of Associative Memory is that the behaviour of a given neuron depends on many neighbours and not just on itself, which means that robust global recall can emerge despite random fluctuations at an individual level.

#### Harnessing network structure

Something that, at least until recently, most neural-network models have failed to take into account is the structure of the network – its topology – it often being assumed that synapses are placed among the neurons completely at random, or even that all neurons are connected to all the rest. Although relatively little is yet known about the architecture of the brain at the level of neurons and synapses, experiments have shown that it is heterogeneous (some neurons have very many more synapses than others), clustered (two neurons have a higher chance of being connected if they share neighbours than if not) and highly modular (there are groups, or modules, with neurons forming synapses preferentially to those in the same module) [25], [26]. We show here that it suffices to use a more realistic network topology, in particular one that is modular and/or clustered, for a randomly chosen pattern of activity the system is placed in to be metastable. This means that novel information can be instantly stored and retained for a short period of time in the absence of both synaptic learning and cellular bistability. The only requisite is that the patterns be coarse grained versions of the usual patterns – that is, whereas it is often assumed that each neuron in some way represents one bit of information, we shall allocate a bit to a small group or neurons. (This does not, of course, mean that memories are expected to be encoded as bitmaps. In fact, we are not making any assumptions regarding neural coding.)

The mechanism, which we call Cluster Reverberation (CR), is very simple. If neurons in a group are more densely connected to each other than to the rest of the network, either because they form a module or because the network is significantly clustered, they will tend to retain the activity of the group: when they are all initially firing, they each continue to receive many action potentials and so go on firing, whereas if they start off silent, there is not usually enough input current from the outside to set them off. (This is similar to the ‘re-entrant’ activity exhibited by excitable elements [27].) The fact that each neuron's state depends on its neighbours confers to the mechanism a certain robustness to random fluctuations. This robustness is particularly important for biological neurons, which as mentioned are quite noisy. Furthermore, not only does the limited duration of short-term memory states emerge naturally from this mechanism (even in the absence of interference from new stimuli) but this natural forgetting follows power-law statistics, as has been observed experimentally [28]–[30]. It is also coherent with recent observations of locally synchronized neural activity *in vivo* [31], and of clustering in both synaptic inputs [32] and plasticity [33] during development. The viability of this mechanism in a more realistic setting could perhaps be put to the test by growing modular and/or clustered networks *in vitro* and carrying out similar experiments as we do here in simulation [34], [35] (see Discussion).

### Results

#### The simplest neurons on modular networks

Consider a network of model neurons, with activities . The topology is given by the adjacency matrix , each element representing the existence or absence of a synapse from neuron to neuron ( need not be symmetric). In this kind of model – a network of what are often referred to as Amari-Hopfield neurons – each edge usually has a *synaptic weight* associated, , which serves to store information [1]–[4]. However, since our objective is to show how this can be achieved without synaptic learning, we shall here consider these to have all the same value: . Neurons are updated in parallel (Little dynamics) at each time step, according to the stochastic transition rule(1)

where is the *field* at neuron , and is a stochasticity parameter called *temperature*. This dynamics can be derived by considering coupled binary elements in a thermal bath, the transition rule stemming from energy differences between states [3], [4], [36].

We shall consider the network defined by to be made up of distinct modules. To achieve this, we can first construct separate random directed networks, each with nodes and mean degree (mean number of neighbours) . Then we evaluate each edge and, with probability , eliminate it (), to be substituted for another edge between the original (postsynaptic) neuron and a new (presynaptic) neuron chosen at random from among any of those in other modules (). We do not allow self-edges (although they can occur in reality) since these could be regarded as equivalent to a form of cellular bistability. Note that this protocol does not alter the number of presynaptic neighbours of each node, , although the number of postsynaptic neurons, , can vary. The parameter can be seen as a measure of *modularity* of the partition considered, since it coincides with the expected value of the proportion of edges that link different modules [37]. In particular, defines a network of disconnected modules, while yields a random network in which this partition has no modularity. If , the partition is less than randomly modular – i.e., it is *quasi-multipartite* (or multipartite if ).

#### Cluster reverberation

A memory pattern, in the form of a given configuration of activities, , can be stored in this system with no need of prior learning. (The system will recall the pattern perfectly when , .) Imagine a pattern such that the activities of all neurons found in any module are the same – i.e., , where the index denotes the module that neuron belongs to. The system can be induced into this configuration through the application of an appropriate stimulus: the field of each neuron will be altered for just one time step according to

where the factor is the intensity of the stimulus (see Fig. 1). This mechanism for dynamically storing information will work for values of parameters such that the system is sensitive to the stimulus, acquiring the desired configuration, yet also able to retain it for some interval of time thereafter (a similar setting is considered, for instance, in Ref. [38]).

**Figure 1. Diagram of a modular network composed of four five-neuron clusters.**

The four circles enclosed by the dashed line represent the stimulus: each is connected to a particular module, which adopts the input state (red or blue) and retains it after the stimulus has disappeared thanks to Cluster Reverberation.

doi:10.1371/journal.pone.0050276.g001The two configurations of minimum energy of the system are and (see the next section for a more detailed discussion on energy). However, the energy is locally minimized for any configuration in which each module comprises either all active or all inactive neurons (that is, for configurations , with a binary variable specific to the whole module that neuron belongs to). These are the configurations that we shall use to store information. We define the mean activity of each module, , which is a mesoscopic variable, as well as the global mean activity, (these magnitudes change with time, but, where possible, we shall avoid writing the time dependence explicitly for clarity; stands for an average over ). The mean activity in a neural network model is usually taken to represent the mean firing rate measured in experiments [39]. The extent to which the network, at a given time, retains the pattern with which it was stimulated is measured with the *overlap* parameter . Ideally, the system should be capable of reacting immediately to a stimulus by adopting the right configuration, yet also be able to retain it for long enough to use the information once the stimulus has disappeared. A measure of performance for such a task is therefore

where is the time at which the stimulus is received and is the period of time we are interested in () [38]. If the intensity of the stimulus, , is very large, then the system will always adopt the right pattern perfectly and will only depend on how well it can then be retained. In this case, the best network will be one that is made up of mutually disconnected modules (). However, since the stimulus in a real brain can be expected to arrive via a relatively small number of axons, either from another part of the brain or directly from sensory cells, it might be more realistic to assume that is of a similar order as the input a typical neuron receives from its neighbours, .

Figure 2 shows the mean performance obtained in Monte Carlo (MC) simulations when the network is repeatedly stimulated with different randomly generated patterns. For low enough values of and stimuli of intensity , the system can capture and successfully retain any pattern it is “shown” for some period of time, even though this pattern was in no way previously learned. For less intense stimuli (), performance is nonmonotonic with modularity: there exists an optimal value of at which the system is sensitive to stimuli yet still able to retain new patterns quite well.

**Figure 2. Performance against for networks of the sort described in the main text with modules of neurons each, , obtained from Monte Carlo (MC) simulations; patterns are shown with intensities , and , and performance is computed evey time steps, preceding the next random stimulus; (error bars represent standard deviations; lines – splines – are drawn as a guide to the eye).**

Inset: typical time series of (i.e., the overlap with whichever pattern was last shown) for (bad performance), (intermediate), and (optimal); with .

doi:10.1371/journal.pone.0050276.g002Just as some degree of structural (quenched) noise, given by , can improve performance by increasing sensitivity, so too the dynamical (annealed) noise set by can have a similar effect. This apparent stochastic resonance is looked into below in Analysis.

#### Energy and topology

Each pair of neurons contributes a configurational energy [4]; that is, if there is an edge from to and they have opposite activities, the energy is increased in , whereas it is decreased by the same amount if their activities are equal. Given a configuration, we can obtain its associated energy by summing over all pairs. To study how the system relaxes from the metastable states (i.e., how it “forgets” the information stored) we shall be interested in configurations with neurons that have (and neurons with ), chosen in such a way that one module at most, say , has neurons in both states simultaneously. Therefore, , where is the number of modules with all their neurons in the positive state and is the number of neurons with positive sign in module . We can write and . The total configurational energy of the system will be

where is the number of edges linking nodes with opposite activities. By simply counting over expected numbers of edges, we can obtain the expected value of (which amounts to a mean-field approximation), yielding:(2)

Figure 3 shows the mean-field configurational energy curves for various values of the modularity on a small modular network. The local minima (metastable states) are the configurations used to store patterns. It should be noted that the mapping is highly degenerate: there are patterns with mean activity that all have the same energy.

**Figure 3. Configurational energy of a network made up of modules of neurons each, according to Eq. (2), for various values of (increasing from bottom to top).**

The minima correspond to situations such that all neurons within any given module have the same sign.

doi:10.1371/journal.pone.0050276.g003#### Forgetting avalanches

In obtaining the energy we have assumed that the number of synapses rewired from a given module is always equal to its expected value: . However, since each edge is evaluated with probability , will in fact vary somewhat from one module to another, being approximately Poisson distributed with mean . Neglecting all but the last term in Eq. (2) and approximating , the depth of the energy well corresponding to a given module is . The typical escape time from an energy well of depth at temperature is [40]. Using Stirling's approximation [] in the Poisson distribution over and expressing it in terms of , we find that the escape times are distributed according to(3)

where(4)

Therefore, at low temperatures, will behave approximately like a power law. Note also that the size of the network, , does not appear in Eqs. (3) and (4). Rather, scales with , which could be small even in the thermodynamic limit ().

The left panel of Fig. 4 shows the distribution of time intervals between events in which the overlap of at least one module changes sign. The power-law-like behaviour is apparent, and justifies talking about *forgetting avalanches* – since there are cascades of many forgetting events interspersed with long periods of metastability. This is very similar to the behaviour observed in other nonequilibrium settings in which power-law statistics arise from the convolution of exponentials, such as demagnetization processes [41] or Griffiths phases on networks [42].

**Figure 4. Left panel: distribution of escape times , as defined in the main text, for and , from MC simulations.**

Slope is for . Other parameters as in Fig. 2. Right panel: exponent of the quasi-power-law distribution as given by Eq. (4) for temperatures , and (from bottom to top).

doi:10.1371/journal.pone.0050276.g004It is known from experimental psychology that forgetting in humans is indeed quite well described by power laws [28]–[30] – although most experiments to date seem to refer to slightly longer timescales than we are interested in here. The right panel of Fig. 4 shows the value of the exponent as a function of . Although for low temperatures it is almost constant over many decades of – approximating a pure power law – for any finite there will always be a such that the denominator in the logarithm of Eq. (4) approaches zero and diverges, signifying a truncation of the distribution.

Note that we have considered the information stored in a pattern to be lost once the system evolves to any other energy minimum. However, this new pattern will be highly correlated with the original one, and it might be reasonable to assume that the system has to escape from a large number of energy minima, , before the information can be considered to have been entirely forgotten. The time for this is , where are independently drawn from Eq (3). If is sufficiently large, the distribution of times will tend to a Lévy distribution [43]. In practice, these different broad-tailed distributions [power-law, Lévy, or as given by Eq. (3)] are likely to be indistinguishable experimentally unless it is possible to observe over many orders of magnitude.

#### Clustered networks

Although we have illustrated how the mechanism of Cluster Reverberation works on a modular network, it is not actually necessary for the topology to have this characteristic – only for the patterns to be in some way “coarse-grained, ” as described, and that each region of the network encoding one bit have a small enough parameter , defined as the proportion of synapses to other regions. For instance, for the famous Watts-Strogatz *small-world* model [44] – a ring of nodes, each initially connected to its nearest neighbours before a proportion of the edges are randomly rewired – we have (which is not surprising considering the resemblance between this model and the modular network used above). More precisely, the expected modularity of a randomly imposed box of neurons is

the second term on the right accounting for the edges rewired to the same box, and the third to the edges not rewired but sufficiently close to the border to connect with a different box.

Perhaps a more realistic model of clustered network would be a random network embedded in -dimensional Euclidean space. For this we shall use the scheme laid out by Rozenfeld *et al.* [45], which consists simply in allocating each node to a site on a -torus and then, given a particular degree sequence, placing edges to the nearest nodes possible – thereby attempting to minimize total edge length. For a scale-free degree sequence [i.e., a set drawn from a degree distribution ] according to some exponent , then, as shown in Analysis, such a network has a modularity(5)

where is the linear size of the boxes considered. It is interesting that even in this scenario, where the boxes of neurons which are to receive the same stimulus are chosen at random with no consideration for the underlying topology, these boxes need not have very many neurons for to be quite low (as long as the degree distribution is not too heterogeneous).

Carrying out the same repeated stimulation test as on the modular networks in Fig. 2, we find a similar behaviour for the scale-free embedded networks. This is shown in Fig. 5, where for high enough intensity of stimuli and scale-free exponent , performance can, as in the modular case, be . We should point out that for good performance on these networks we require more neurons for each bit of information than on modular networks with the same (in Fig. 5 we use , as opposed to in Fig. 2). However, that we should be able to obtain good results for such diverse network topologies underlines that the mechanism of Cluster Reverberation is robust and not dependent on some very specific architecture.

**Figure 5. Performance against exponent for scale-free networks, embedded on a 2D lattice, with patterns of modules of neurons each, and ; patterns are shown with intensities , , and , and (error bars represent standard deviations; lines – splines – are drawn as a guide to the eye).**

Inset: typical time series for , , and , with .

doi:10.1371/journal.pone.0050276.g005#### Spiking neurons

In the usual spirit of determining the minimal ingredients for a mechanism to function we have, up until now, used the simplest model neurons able to exhibit CR. This approach makes for a good illustration of the main idea and allows for a certain amount of analytical understanding of the underlying phenomena. However, before CR can be considered as a plausible candidate for helping to explain short-term memory, we must check that it is compatible with more realistic neural models. For this we examine the behaviour of the popular Integrate-and-Fire (IF) model neurons – often referred to as *spiking neurons* – in the same kind of setting as described above for the simpler Amari-Hopfield neurons. In the IF model, each neuron is characterized at time by a *membrane potential* , described by the differential equation

where and are, respectively, the membrane time constant and resistance, and ; the term is the synaptic current generated by the arrival of Action Potentials (AP) from the neuron's presynaptic neighbours, is the current generated by the presentation of a particular external stimulus to the network and is an additional noisy external current. Here and are constants and is a Gaussian noise of mean and autocorrelation . Each synaptic contribution to the total synaptic current is modelled as , where represents the fraction of neurotransmitters in the synaptic cleft, which follows the dynamics [46]

Here, is the time at which an AP arrives at synapse , inducing the release of a fraction of neurotransmitters, and is the typical time-scale for neurotransmitter inactivation. Whenever surpasses a given threshold , the neuron fires an AP to all its postsynaptic neighbours and is reset to zero, then undergoing a refractory time before again becoming susceptible to input. Because the parameters and variables of this model represent measurable physiological quantities, it is possible to use it to make quantitative – albeit tentative – predictions about the timescales on which CR might be expected to be effective in a real neural system.

Figure 6 is a raster plot of a modular network of IF neurons. The system performs a short-term memory task akin to the one previously described for the Amari-Hopfield neural network: the neurons belonging to clusters that correspond to ones in a random pattern are stimulated, for ms, with an intensity , while the the remaining neurons receive an opposite stimulus, . We then allow the system to evolve for ms, before choosing a new random pattern and stimulating again. In such tests, the neurons in positively stimulated clusters usually begin to oscillate in synchrony, while the rest remain silent (save for occasional individual APs caused by noise). However, since this is a metastable state, with time active clusters can suddenly go mostly silent, or the neurons in silent clusters begin spontaneously to fire in synchrony. Thus, the information is gradually lost, as in the case with simpler neurons.

**Figure 6. Raster plot, obtained from MC simulations, of a network of integrate-and-fire (IF) neurons wired up (as described in the main text) in groups of , with a rewiring probability .**

Every ms, a new pattern is shown for ms with an intensity pA (plotted in blue). Parameters for the neurons are pA, mV, ms, ms, , and ms, which are all within the physiological range; and the external noisy current is modelled with pA and pA ms.

doi:10.1371/journal.pone.0050276.g006To gauge how well the system is performing the task, we look at each cluster for the last ms before the next stimulus and assign a value to its mean activity if it is active, and if it is silent. We then define the performance as:(6)

In Fig. 7 we show the values of obtained in MC simulations against . Using different values of we observe a similar behaviour to that of Fig. 2. In particular, for pA, we have the interesting nonmonotonic behaviour in which performance benefits from a certain degree of rewiring. While, for the sake of illustration, in Fig. 6 we only show the evolution of the system for ms after stimulation, in Fig. 7 we wait for five seconds. Although the model is too simple, and the network too small, to make quantitative predictions about the brain, it is nevertheless promising that with physiologically realistic parameters we observe high performance () over several seconds, since this is the timescale on which short-term memory operates in humans.

**Figure 7. Performance against rewiring for modular networks of IF neurons, as obtained from MC simulations.**

The network is periodically stimulated with a new random pattern for ms with an intensity pA (green squares), pA (red circles) and pA (blue triangles) (error bars represent standard deviations; lines – splines – are drawn as a guide to the eye). The system evolves in the absence of stimuli for ms and performance, , is computed according to Eq. (6). (An interval of seconds corresponds roughly to the timescale on which short-term memory operates in the brain.) Other parameters are as in Fig. 6.

doi:10.1371/journal.pone.0050276.g007### Discussion

Cluster Reverberation may be a means available to neural systems for performing certain short-term tasks, such as sensory memory or working memory. To the best of our knowledge, it is the first mechanism proposed to use network properties with no need of synaptic learning. All that is required is for the underlying network to be highly clustered or modular, and for small groups of neurons in some sense to store one bit of information, as opposed to a conventional view which assumes one bit per neuron. Considering the enormous number of neurons in the brain, and the fact that real neurons are possibly too noisy to store information individually anyway, these hypotheses do not seem far-fetched. The mechanism is furthermore consistent with what is known about the structure of biological neural networks, with experiments that have revealed power-law statistics of forgetting, and with recent observations of locally synchronized synaptic activity.

For the sake of illustration, we have focused here on the simplest model neurons that are able to exhibit the behaviour of interest. However, we have shown how the mechanism can also work with the slightly more sophisticated Integrate-and-Fire neurons, and there is no reason to believe that it would not also be viable with more realistic models, or even actual cells. Although CR comes about thanks to the high modularity of small groups of neurons, we have shown how robust it is to the details of the topology by carrying out simulations on clustered networks with no explicitly built-in modularity. And this setting suggests an interesting point. If an initially homogeneous (i.e., neither modular nor clustered) area of brain tissue were repeatedly stimulated with different patterns in the same way as we have done in our simulations, then synaptic plasticity mechanisms (LTP and LTD) might be expected to alter the network structure in such a way that synapses within each of the imposed modules would all tend to become strengthened, while inter-module synapses would vary their weights in accordance with the details of the patterns being shown [47]. The result would be a modular structure conducive to efficient CR for arbitrary patterns, with simultaneous Hebbian learning in the inter-synapses of the actual patterns shown. In this way, the same network might be capable of both short-term and long-term memory, explaining, perhaps, why our brains can indeed store completely novel information but usually with a certain bias in favour of what we are expecting to perceive.

Although we have not gone into the question of neural coding, there would seem to be an intrinsic difference between *semantic* storage of information – used for long-term memory and probably useful for certain working-memory tasks that require the labelling of previously learned information – and *sensory* storage, for which some mechanism such as the one proposed here must store novel information immediately – in a similar but more efficient way to how the retina retains the pigmentation left by an image it was recently exposed to. If novel sensory information were held for long enough in metastable states, Hebbian learning (either in the same or other areas of the brain) could take place and the information be stored thereafter indefinitely. This might constitute the essence of concentrating so as to memorise a recent stimulus.

Finally, we should mention that CR could work in conjunction with other mechanisms, such as processes leading to cellular bistability, making these more robust to noise and augmenting their efficacy. Whether CR would work for biological neural systems could perhaps be put to the test by growing such modular networks *in vitro*, stimulating appropriately, and observing the duration of the metastable states [34], [35]. *In vivo* recordings of neural activity during short-term memory tasks, together with a mapping of the underlying synaptic connections, might be used to ascertain whether the brain could indeed harness this mechanism. For this it must be borne in mind that the neurons forming a module need not find themselves close together in metric space, and that effective modularity might come about via stronger intra- than inter-connexions, instead of simply through a higher density of synapses within the clusters. We hope that observations and experiments such as these will be carried out and eventually reveal something more about the basis of this puzzling emergent property of the brain's known as thought.

### Analysis

#### The effect of noise

On a random network (), the Amari-Hopfield model described in the main text has a second order phase transition with temperature, , at [4]. This can be seen by considering the mean-field equation for the overlap at the steady state, , where we have substituted in Eq. (1). For , the paramagnetic solution becomes unstable, and ferromagnetic solutions appear [36]. This result also holds for the modular networks described in the main text. However, that the global overlap is different form zero does not mean that the short-term memory configurations we are interested in are stable. In fact, we know they are metastable for any (see Results: Energy and topology), but we can set an upper bound on the temperature at which these states can be maintained even for a short time by considering again the mean-field equation for such a configuration. For a neuron in module , . For patterns with mean activity zero (), states will be unstable if .

As we saw from Fig. 2, for stimuli , the system does not always leave whichever meatastable state it is in to go perfectly to the pattern shown. A degree of “structural noise” () can lead to a better response. In the same way, the dynamical noise set by can improve performance. Figure 8 shows how performance varies with for different values of . Due to the trade-off between sensitivity to stimuli and stability of the memory states, there is in general an optimum level of noise at which the system performs best. This dynamics can be interpreted as a kind of stochastic resonance, with the stimuli playing the part of the periodic forcing typically seen in such systems [48]. Both the dynamic (annealed) noise, , and the structural (quenched) noise, , serve to increase the sensitivity of the system to stimuli.

**Figure 8. Performance **

** against **** for the Hopfield-Amari networks described in the main text, obtained from MC simulations, for values of the rewiring ****, ****, **** and ****, and stimulus ****.** All other parameters as in Fig. 2. (Error bars represent standard deviations; lines – splines – are drawn as a guide to the eye).

It is interesting to observe in Fig. 8 that whereas highly modular networks () are most robust to , for no values of parameters do they exhibit as good performance as the less modular networks when is relatively low.

#### Effective modularity of clustered networks

We wish to estimate , the proportion of edges that cross the boundaries of a box of linear size placed randomly on a network embedded in -dimensional space according to the scheme laid out in Ref. [45]. The number of nodes within a radius is , with a constant. We shall therefore assume a node with degree to have edges to all nodes up to a distance , and none beyond (note that this is not necessarily always feasible in practice). To estimate , we shall first calculate the probability that a randomly chosen edge have length . The chance that the edge belong to a node with degree is (where is the degree distribution). The proportion of edges that have length among those belonging to a node with degree is if , and otherwise. Considering, for example, scale-free networks (as in Ref. [45]), so that the degree distribution is in some interval , and integrating over , we have the distribution of lengths,

where we have assumed, for simplicity, that the network is sufficiently sparse that , , and where we have normalised for the interval ; strictly, , but we shall also ignore this effect. Next we need the probability that an edge of length fall between two compartments of linear size . This depends on the geometry of the situation as well as dimensionality; however, a first approximation which is independent of such considerations is

We can now estimate the modularity as(7)

Figure 9 compares this expression with the value obtained numerically after averaging over many network realizations, and shows that it is fairly good – considering the approximations used for its derivation.

**Figure 9. Proportion of outgoing edges, **

**, from boxes of linear size **** against exponent **** for scale-free networks embedded on **** lattices.** Lines from Eq. (7) and symbols (with error bars representing standard deviations) from simulations with and .

### Acknowledgments

Many thanks to Jorge F. Mejias, Sebastiano de Franciscis, Miguel A. Muñoz, Sabine Hilfiker, Peter E. Latham, Ole Paulsen and Nick S. Jones for useful comments and suggestions.

### Author Contributions

Conceived and designed the experiments: SJ. Performed the experiments: SJ JJT. Analyzed the data: SJ JM JJT. Wrote the paper: SJ JM JJT.

### References

- 1.
Hebb DO (1949) The organization of behavior. New York: Wiley.
- 2. Amari S (1972) Characteristics of random nets of analog neuron-like elements. IEEE Trans Syst Man Cybern 2: 643–657. doi: 10.1109/tsmc.1972.4309193
- 3. Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 79: 2554–2558. doi: 10.1073/pnas.79.8.2554
- 4.
Amit DJ (1989) Modeling brain function. Cambridge: Cambridge Univ Press.
- 5. Gruart A, Muñoz MD, Delgado-García JM (2006) Involvement of the CA3-CA1 synapse in the acquisition of associative learning in behaving mice. J Neurosci 26: 1077–1087. doi: 10.1523/jneurosci.2834-05.2006
- 6. De Roo M, Klauser P, Mendez P, Poglia L, Muller D (2008) Activity-dependent PSD formation and stabilization of newly formed spines in hippocampal slice cultures. Cereb Cortex 18: 151–161. doi: 10.1093/cercor/bhm041
- 7. Durstewitz D, Seamans JK, Sejnowski TJ (2000) Neurocomputational models of working memory. Nat Neurosci 3: 1184–1191. doi: 10.1038/81460
- 8. Lee KS, Schottler F, Oliver M, Lynch G (1980) Brief bursts of high-frequency stimulation produce two types of structural change in rat hippocampus. J Neurophysiol 44: 247–258.
- 9. Klintsova AY, Greenough WT (1999) Synaptic plasticity in cortical systems. Curr Opin Neurobiol 9: 203–208. doi: 10.1016/s0959-4388(99)80028-2
- 10. Sperling GA (1960) The information available in brief visual persentation. Psychol Monogr 74: 1–30.
- 11. Cowan N (1984) On short and long auditory stores. Psychol Bull 96: 341–370. doi: 10.1037/0033-2909.96.2.341
- 12.
Baddeley AD (1999) Essentials of human Memory. London: Psychology Press.
- 13. Baddeley A (2003) Working memory: looking back and looking forward. Nat Rev Neurosci 4: 829–839. doi: 10.1038/nrn1201
- 14. Conrad R (1964) Acoustic confusion in immediate memory. B J Psychol 55: 75–84. doi: 10.1111/j.2044-8295.1964.tb00899.x
- 15. Conrad R (1964) Information, acoustic confusion and memory span. B J Psychol 55: 429–432. doi: 10.1111/j.2044-8295.1964.tb00928.x
- 16. Barak O, Tsodyks M (2007) Persistent activity in neural networks with dynamic synapses. PLoS Comput Biol 3: e35. doi: 10.1371/journal.pcbi.0030035.eor
- 17. Roudi Y, Latham PE (2007) A balanced memory network. PLoS Comput Biol 3: e141. doi: 10.1371/journal.pcbi.0030141.eor
- 18. Mongillo G, Barak O, Tsodyks M (2008) Synaptic theory of working memory. Science 319: 1543–1546. doi: 10.1126/science.1150769
- 19. Wang XJ (2001) Synaptic reverbaration underlying mnemonic presistent activity. Trends Neurosci 24: 455–463. doi: 10.1016/s0166-2236(00)01868-3
- 20. Chialvo DR, Cecchi GA, Magnasco MO (2000) Noise-induced memory in extended excitable systems. Phys Rev E 61: 5654–5657. doi: 10.1103/physreve.61.5654
- 21. Camperi M, Wang XJ (1998) A model of visuospatial working memory in prefrontal cortex: recurrent network and cellular bistability. J Comp Neurosci 5: 383–405.
- 22. Teramae JN, Fukai T (2005) A cellular mechanism for graded persistent activity in a model neuron and its implications for working memory. J Comput Neurosci 18: 105–121. doi: 10.1007/s10827-005-5474-6
- 23. Tarnow E (2008) Short term memory may be the depletion of the readily releasable pool of presynaptic neurotransmitter vesicles. Cogn Neurodyn 3: 263–269. doi: 10.1007/s11571-009-9085-1
- 24. Compte A, Constantinidis C, Tegnér J, Raghavachari S, Raghavachari S, et al. (2003) Temporally irregular mnemonic persistent activity in prefrontal neurons of monkeys during a delayed response task. J Neurophysiol 90: 3441–3454. doi: 10.1152/jn.00949.2002
- 25. Sporns O, Chialvo DR, Kaiser M, Hilgetag CC (2004) Organization, development and function of complex brain networks. Trends Cogn Sci 8: 418–425. doi: 10.1016/j.tics.2004.07.008
- 26. Johnson S, Marro J, Torres JJ (2010) Evolving networks and the development of neural systems. J Stat Mech P03003 doi: 10.1088/1742-5468/2010/03/p03003
- 27. Lewis TJ, Rinzel J (2000) Self-organized synchronous oscillations in a network of excitable cells coupled by gap junctions. Comput Neural Syst 11: 299–320. doi: 10.1088/0954-898x/11/4/304
- 28. Wixted JT, Ebbesen EB (1991) On the form of forgetting. Psychol Sci 2: 409–415. doi: 10.1111/j.1467-9280.1991.tb00175.x
- 29. Wixted JT, Ebbesen EB (1997) Genuine power curves in forgetting: A quantitative analysis of individual subject forgetting functions. Mem Cognition 25: 731–739. doi: 10.3758/bf03211316
- 30. Sikström S (2002) Forgetting curves: implications for connectionist models. Cognitive Psychol 45: 95–152. doi: 10.1016/s0010-0285(02)00012-9
- 31. Takahashi N, Kitamura K, Matsuo N, Mayford M, Kano M, et al. (2012) Locally synchronized synaptic inputs. Science 335: 353–356. doi: 10.1126/science.1210362
- 32. Kleindienst T, Winnubst J, Roth-Alpermann C, Bonhoeffer T, Lohmann C (2011) Activitydependent clustering of functional synaptic inputs on developing hippocampal dendrites. Neuron 72: 1012–1024. doi: 10.1016/j.neuron.2011.10.015
- 33. Makino H, Malinow R (2011) Compartmentalized versus global synaptic plasticity on dendrites controlled by experience. Neuron 72: 1001–1011. doi: 10.1016/j.neuron.2011.09.036
- 34. Kohl MM, Shipton OA, Deacon RM, Rawlins JN, Deisseroth K, et al. (2011) Hemisphere-specific optogenetic stimulation reveals left-right asymmetry of hippocampal plasticity. Nat Neurosci 14: 1413–1415. doi: 10.1038/nn.2915
- 35. Shein-Idelson M, Ben-Jacob E, Hanein Y (2011) Engineered neuronal circuits: A new platform for studying the role of modular topology. Front Neuroeng 4: 10. doi: 10.3389/fneng.2011.00010
- 36. Fraiman D, Balenzuela P, Foss J, Chialvo D (2009) Ising-like dynamics in large-scale functional brain networks. Phys Rev E 79: 61922–61931. doi: 10.1103/physreve.79.061922
- 37. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev. 45: 167–256. doi: 10.1137/s003614450342480
- 38. Johnson S, Marro J, Torres JJ (2008) Functional optimization in complex excitable networks. EPL 83: 46006. doi: 10.1209/0295-5075/83/46006
- 39.
Torres JJ, Varona P (2011) Modeling biological neural networks, in Handbook of natural computing, Rozenberg G, Bäck THW, Kok JN (Eds). Berlin: Springer-Verlag.
- 40.
Levine RD (2005) Molecular reaction dynamics. Cambridge: Cambridge University Press.
- 41. Hurtado PI, Marro J, Garrido PL (2008) Demagnetization via nucleation of the nonequilibrium metastable phase in a model of disorder. J Stat Phys 133: 29–58. doi: 10.1007/s10955-008-9602-3
- 42. Muñoz MA, Juhasz R, Castellano C, Odor G (2010) Griffiths phases in complex networks. Phys Rev Lett 105: 128701. doi: 10.1103/physrevlett.105.128701
- 43.
Gnedenko BV, Kolmogorov AN (1954) Limit distributions for sums of independent random variables. Cambridge: Addison-Wesley.
- 44. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 395: 440–442. doi: 10.1038/30918
- 45. Rozenfeld AF, Cohen R, ben-Avraham D, Havlin S (2002) Scale-free networks on lattices. Phys Rev Lett 89: 218701. doi: 10.1103/physrevlett.89.218701
- 46. Tsodyks MV, Markram H (1997) The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability. Proc Natl Acad Sci USA 94: 719–723. doi: 10.1073/pnas.94.2.719
- 47. van Aerde KI, Mann EO, Canto CB, Heistek TS, Linkenkaer-Hansen K, et al. (2009) Flexible spike timing of layer 5 neurons during dynamic beta-oscillation shifts in rat prefrontal cortex. J Physiol 587: 5177–5196. doi: 10.1113/jphysiol.2009.178384
- 48. Benzi R, Sutera A, Vulpiani A (1981) The mechanism of stochastic resonance. J Phys A 14: 453–457. doi: 10.1088/0305-4470/14/11/006