Reader Comments

Post a new comment on this article

Missing important references and correction concerning the comparison with momentary information transfer

Posted by jakobrunge on 30 Apr 2013 at 09:10 GMT

In their article Wibral et al. (2013), the authors propose a measure of interaction delays rooted in an information-theoretic framework. Their measure, named TE_SPO where SPO stands for self-prediction optimality, is a time-delayed extension to transfer entropy that becomes maximal only at the actual interaction delay as proven in their paper.

Wibral et al. only considered the bivariate coupling case. Prior to their publication, in Runge et al. (2012b,a) the theory of detecting and quantifying causal interactions and their strength for the general multivariate case is discussed. However, Wibral et al. present an interesting complementary approach in that they propose to determine the interaction delay based on the reconstructed (vector-valued) states rather than the (scalar) observations of the complex systems under study.

Wibral et al. contrast their measure with a similar information-theoretic measure, the momentary information transfer (MIT), that was introduced in Pompe and Runge (2011). Wibral et al. write that “A major conceptual difference between the Pompe and Runge study and ours is that no formal proof of the maximality of their functional MIT at the correct interaction delay is given, and as we argue below cannot be given.” Rather than providing a proof that the maximality of MIT cannot be given, they construct a model example for which they find the MIT unable to serve the purpose of inferring the interaction delay. But, as shown below, the reasoning in their “maximality”-proof equally applies to the MIT and can, thus, not be used to disregard MIT. Rather, their example model seems to not fulfill the assumptions implicitly used in their own proof.

The “formal proof of the maximality” of MIT was actually given in the above-mentioned subsequent works in Runge et al. (2012b,a). The latter works further developed the original idea of Pompe and Runge (2011) in that a two-step approach is proposed. In the first step the causal graph (conditional independence graph) is reconstructed (Runge et al., 2012b) using the well-established framework of graphical models, where the property to capture the correct causal interaction delays is a trivial consequence of separation properties in the graph. In Runge et al. (2012b) also the underlying assumptions for such an inference are given. In the second step the MIT is used as a measure of the coupling strength solely of the causal links, i.e., the inferred interaction delays, in this graph (Runge et al., 2012a).

Wibral at al. also have overseen that their estimator of conditional mutual information given by their Eq. (19) has already been developed in Frenzel and Pompe (2007). The latter work also discussed the inference of interaction delays.

Our statements are detailed in a preprint downloadable from www.pik-potsdam.de/member... and in the arXiv.

References:
Pompe, B., & Runge, J. (2011). Momentary information transfer as a coupling measure of time series. Phys. Rev. E, 83(5), 1–12. doi:10.1103/PhysRevE.83.051122
Runge, J., Heitzig, J., Petoukhov, V., & Kurths, J. (2012b). Escaping the Curse of Dimensionality in Estimating Multivariate Transfer Entropy. Physical Review Letters, 108(25), 1–4. doi:10.1103/PhysRevLett.108.258701
Runge, J., Heitzig, J., Marwan, N., & Kurths, J. (2012a). Quantifying Causal Coupling Strength: A Lag-specific Measure For Multivariate Time Series Related To Transfer Entropy. Phys. Rev. E, 86(6), 1–15. Data Analysis, Statistics and Probability; Information Theory; Information Theory; Machine Learning. doi:10.1103/PhysRevE.86.061121
Frenzel, S., & Pompe, B. (2007). Partial Mutual Information for Coupling Analysis of Multivariate Time Series. Phys. Rev. Lett., 99(20), 204101. doi:10.1103/PhysRevLett.99.204101

No competing interests declared.

RE: Missing important references and correction concerning the comparison with momentary information transfer

MichaelW replied to jakobrunge on 30 Apr 2013 at 13:49 GMT

We are grateful to Jakob Runge for pointing out the additonal work that has been done since the original publication on MIT and look forward to a fruitful discussion on this forum.
In response to the above comment we would like to stress three important points, however:
(1) In the example given in our study, MIT does not recover the correct delay. We do give a quantitative explanation of this failure by detailing how MIT measures the time interval between the time point when the information in question is first seen in the source (and potentially stored there for a while, invisible to the target, before being transfered), and when it's seen in the target. Hence, MIT measures the sum of information storage time in the source and the actual transfer delay. We do believe that this summed time is a useful quantity in itself, but it is not exactly the information transfer delay, as was claimed incorrectly. The incorrectness of this claim was demonstrated by example in our study. The formulation of MIT that is based on scalar observables makes it impossible for variables to store information in the same way systems states do this, which is possibly why this subtlety was overlooked.
(2) Information transfer as defined by Schreiber is based on (Markov) states of systems and explicitely NOT on scalar observations (see definition of transfer entropy on page 462 in Schreiber, Phys Rev Lett (85)2, 2000). As a consequence, a failure to reconstruct the systems' states properly (i.e. by using scalar observations) may yield spurious results -- a simple example for this phenomenon was already given in Vicente et al., J Comp Neurosci, 2011.
(3) The Pompe and Frenzel (2007) partial mutual information estimator that is mentioned in the above comment is not a transfer entropy estimator. Therefore it does not measure information TRANSFER as pointed out in Schreiber 2000. Schreiber clearly separates estimators for information transfer from lagged mutual information estimators in all generality. The Pompe and Frenzel estimator is therefore quite different from our estimator, both in formulation and in purpose. As far as the actual entropy estimation technique employed by Pompe and Frenzel is concerned, we cite the source of the idea to use neighbour statistics as: Kraskov et al., Phys. Rev. E 69, 066138 (2004). This is the same source that Pompe and Frenzel cite for their estimator. We also note that the possibility to estimate transfer entropy functionals using this technique was also already demonstrated in Kraskov's PhD thesis from 2004.

Competing interests declared: I am the author of the study commented on.

RE: RE: Missing important references and correction concerning the comparison with momentary information transfer

MichaelW replied to MichaelW on 30 Apr 2013 at 14:18 GMT

It would be good to obtain the arXiv url for the preprint mentioned in the comment by Runge as the url to the pik-potsdam website requres a login.

No competing interests declared.

RE: Missing important references and correction concerning the comparison with momentary information transfer

MichaelW replied to jakobrunge on 02 May 2013 at 08:10 GMT

In his comment Jakob Runge claims that MIT fails to recover the correct delay in our example because "...Rather, their example model seems to not fulfill the assumptions implicitly used in their own proof. ".

We have carefully checked this and conclude:
(a) We are not aware of any assumptions that are implict in our proof. The relevant assumptions are explicitely given. The confusion about this point (i.e. implicit assumptions) may arise from the fact that both our proof, and the graph supporting it are based on a state space notation (Bold typeface) that seems to have been misunderstood as a representsation of a scalar process.
(b) The (explicit) assumptions in our proof are simply that the two processes in question do have a state space representation, i.e. are Markovian. It is easy to see for our test case I that p(x(t) | x(t-1), x(t-2)) = p(x(t) | x(t-1)), i.e. X is a Markov chain of order 1, in this case even in scalar representation. If it were not it could of course be transformed into a Markov chain of order 1 in a state space representation. The same holds for y(t) which is a noisy copy of x(t). We note that both processes are also stationary, although this is not necessary for our proof. Hence the causal graph of the processes in our proof (which is explicitely given in a state space representation in figure 2, so that all Markov processes are of order one) does apply to our example.
Hence, we concluce that the failure of MIT to reconstruct the correct delay is not due to our example not meeting "implicit assumptions" of our proof.

No competing interests declared.

RE: Missing important references and correction concerning the comparison with momentary information transfer

MichaelW replied to jakobrunge on 02 May 2013 at 08:19 GMT

Jakob Runge states that our proof does also hold for MIT, and give a formal argument in in arxiv 1304.7930.
We note, that for their argument they use the causal graph in our figure 2. However, this causal graph is only valid if the quanatities included as nodes in the grasph are states in a state space. If the quantities in the graph represent any other (smaller) collections of variables the graph looses it is generality and does not represent generic Markov processes, and d-separation can not be shown using this graph. Since MIT is not based on a state space representation, our graph may not be used to proof properties of MIT. Hence, the derivation given in arxiv 1304.7930 so far lacks the necessary prerequisites to arrive at their formula 2.

No competing interests declared.

RE: Missing important references and correction concerning the comparison with momentary information transfer

MichaelW replied to jakobrunge on 02 May 2013 at 08:25 GMT

Jakob Runge states that the failure of MIT to recover the correct interaction delay seems to be because "Rather, their example model seems to not fulfill the assumptions implicitly used in their own proof. "

We are not aware of what these implicit assumptions would be and note that the example processes in test case 1 have a state space representation and are Markovian of order 1, and hence, do fullfill the assumptions necessary for our proof to hold.
We also note that our transfer entropy estimator does indeed recover the correct delay for this process, as predicted by pour proof.

No competing interests declared.

RE: RE: Missing important references and correction concerning the comparison with momentary information transfer

jakobrunge replied to MichaelW on 11 May 2013 at 10:20 GMT

I thank Michael Wibral for stressing the distinction between states and scalar observations. This is really an important and interesting subject that needs to be addressed further.

But I think in order to really understand how the state-based causal inference can be distinguished from the scalar framework, it would be very helpful if a formal definition of the state-based graph as drawn in Fig. 2 could be given. I.e., how are nodes defined and how are links defined. Wibral et al. refer to the Ragwitz criterion, which is an optimization scheme for the embedding-delay vector. It would be good to discuss the results of this scheme on a simple autoregressive process for which the scalar theory is well-known.
Based on this definition one can than talk more precisely about how this defined graph can be estimated using the proposed measure and how it is different from scalar conditional independence graphs (i.e., the time series graphs used in Runge et al. (2012a,b)).
After all, the reconstruction of STATES comes from studying the independence structure of SCALAR observations (only these are measured!) and, thus, the scalar- and state-based concepts of information transfer probably share a lot in common.

In Runge et al. (2012a,b) the proper definition of scalar time series graphs allowed to utilize known statistical concepts (like the PC-Algorithm for graph estimation) and the theorem by Eichler (2012) that gives the conditions for Markov properties to hold.

In Runge et al. (2012b) we also give a modified definition of MIT, called MITN, that measures the influence of X together with its parents on Y. The latter definition constitutes a vector and seems similar to the states proposed by Wibral et al., albeit MITN is already phrased in the multivariate context. Again, this interesting topic needs to be addressed further.

Regarding the difference between the estimator proposed by Wibral et al. and the one by Frenzel et al. (2007): Generally, Transfer Entropy IS a conditional mutual information I(X,Y|Z), just one based on states. It seems to me that states are simply vectors of scalar observations and X (or Y and Z) can, therefore, be multidimensional. But this changes the estimation formula only in that distances between vectors rather than scalars are considered. The crucial point of the Frenzel estimator is the clever choice of which "k" to use in the joint and marginal entropy estimates and this idea is also used in Wibral et al. (2013).

Additional Reference:
Eichler, M. (2012). Graphical modelling of multivariate time series. Probability Theory and Related Fields, 1, 233. doi:10.1007/s00440-011-0345-8

No competing interests declared.

Error in the proof given in http://arxiv.org/abs/1304.7930v1

MichaelW replied to jakobrunge on 11 Nov 2013 at 10:28 GMT

Dear Jakob,

after critical evaluation we found that the proof of interaction delay reconstruction via MIT given in your above reference on arXiv contains an error in equation (4):

On both sides of the equation the conditioning is performed for the Parents of x(t-delta-xi); for a proof of delay reconstruction using the momentary information transfer, however, the conditioning on the right hand side of equation (4) should be on the parents of x(t-delta) not on those of x(t-delta-xi). Otherwise the the conditioning ...

(a) ... does not represent MIT for x(t-delta), as it should for a proof of maximality at the correct delay.
(b) ... is potentially on a future sample of x as xi is allowed to be negative in our original proof.

In addition, we have carefully checked the example given in our PLOS one paper, and still find that MIT does not reconstruct the delay correctly, even though this is a very simple case with one source and one target, and the number of Parents of the source x, and the target y is very small.

Michael Wibral

No competing interests declared.

RE: Error in the proof given in http://arxiv.org/abs/1304.7930v1

jakobrunge replied to MichaelW on 17 Jan 2014 at 11:40 GMT

I thank Michael Wibral for bringing this error to my attention. Indeed, Eq.(4) in the arXiv article does not prove the maximality of the bivariate BivMIT used in Pompe and Runge (Physical Review E, 2011). We have now uploaded a corrected version (http://arxiv.org/abs/1304...) where we show that the BivMIT could be maximal for a different delay. Interestingly, BivMIT can be regarded to some extent as a derivative of Wibral et al.'s measure and the maximum, therefore, depends on its decay rate.

We also found that while the maximality of Wibral et al.'s measure can be proven for the bivariate case with ONE coupling delay (possibly also in the other direction), it is not necessarily maximal anymore if there are TWO (or more) coupling delays in one direction.

Note, that in the subsequent articles Runge et al. (Physical Review Letters, 2012, and Physical Review E, 2012) not the BivMIT is proposed to measure interaction delays, but an iterative approach involving a measure very similar to Wibral et al.'s measure (but published already before in 2012). This approach correctly identifies interaction delays also for multiple delays and the general multivariate case. In Runge et al. (Physical Review Letters, 2012) also the underlying assumptions for such an inference are given.

An interesting question is what the difference between the past states used in Wibral et al.'s approach inferred by the Ragwitz criterion (for the univariate time series case) and the parents used in our approach (for the multivariate case, including the univariate one) is. The parents of a process are properly defined by conditional independence and inferred by a consistent estimator, the PC-Algorithm.

Further details can be found in http://arxiv.org/abs/1304....

References:
Bernd Pompe and Jakob Runge. Momentary information transfer as a coupling measure of time series. Phys. Rev. E, 83(5):1–12, May 2011. ISSN 1539-3755.

Jakob Runge, Jobst Heitzig, Vladimir Petoukhov, and Juergen Kurths. Escaping the Curse of Dimensionality in Estimating Multivariate Transfer Entropy. Physical Review Letters, 108(25):1–4, (2012)

Jakob Runge, Jobst Heitzig, Norbert Marwan, and Juergen Kurths. Quantifying Causal Coupling Strength: A Lag-specific Measure For Multivariate Time Series Related To Transfer Entropy. Phys. Rev. E, 86(6):1–15 (2012)

No competing interests declared.