更改

跳到导航 跳到搜索
添加12字节 、 2024年9月20日 (星期五)
无编辑摘要
第14行: 第14行:  
In 2022, to address the calculation of EI in general [[Feedforward Neural Networks]], [[Zhang Jiang]] and [[Liu Kaiwei]] removed the variance constraint from the [[Causal Geometry]] approach and explored a more general form of EI, <ref  name=zhang_nis>{{cite journal|title=Neural Information Squeezer for Causal Emergence|first1=Jiang|last1=Zhang|first2=Kaiwei|last2=Liu|journal=Entropy|year=2022|volume=25|issue=1|page=26|url=https://api.semanticscholar.org/CorpusID:246275672}}</ref><ref name=yang_nis+>{{cite journal|title=Finding emergence in data by maximizing effective information|author1=Mingzhe Yang|author2=Zhipeng Wang|author3=Kaiwei Liu|author4=Yingqi Rong|author5=Bing Yuan|author6=Jiang Zhang|journal=arXiv|page=2308.09952|year=2024}}</ref><ref name=liu_exact>{{cite journal|title=An Exact Theory of Causal Emergence for Stochastic Iterative Systems|author1=Kaiwei Liu|author2=Bing Yuan|author3=Jiang Zhang|journal=arXiv|page=2405.09207|year=2024}}</ref>. Nonetheless, a limitation remained: because the uniform distribution of variables in the real-number domain is strictly defined over an infinite space, the calculation of EI involved a parameter [math]L[/math], representing the range of the uniform distribution. To avoid this issue and enable comparisons of EI at different levels of [[Granularity]], the authors proposed the concept of [[Dimension-averaged EI]]. They found that the [[Measure of Causal Emergence]] defined by [[Dimension-averaged EI]] was solely dependent on the determinant of the [[Neural Network]]'s [[Jacobian Matrix]] and the variance of the random variables in the two compared dimensions, independent of other parameters such as L. Additionally, [[Dimension-averaged EI]] could be viewed as a [[Normalized EI]], or Eff.
 
In 2022, to address the calculation of EI in general [[Feedforward Neural Networks]], [[Zhang Jiang]] and [[Liu Kaiwei]] removed the variance constraint from the [[Causal Geometry]] approach and explored a more general form of EI, <ref  name=zhang_nis>{{cite journal|title=Neural Information Squeezer for Causal Emergence|first1=Jiang|last1=Zhang|first2=Kaiwei|last2=Liu|journal=Entropy|year=2022|volume=25|issue=1|page=26|url=https://api.semanticscholar.org/CorpusID:246275672}}</ref><ref name=yang_nis+>{{cite journal|title=Finding emergence in data by maximizing effective information|author1=Mingzhe Yang|author2=Zhipeng Wang|author3=Kaiwei Liu|author4=Yingqi Rong|author5=Bing Yuan|author6=Jiang Zhang|journal=arXiv|page=2308.09952|year=2024}}</ref><ref name=liu_exact>{{cite journal|title=An Exact Theory of Causal Emergence for Stochastic Iterative Systems|author1=Kaiwei Liu|author2=Bing Yuan|author3=Jiang Zhang|journal=arXiv|page=2405.09207|year=2024}}</ref>. Nonetheless, a limitation remained: because the uniform distribution of variables in the real-number domain is strictly defined over an infinite space, the calculation of EI involved a parameter [math]L[/math], representing the range of the uniform distribution. To avoid this issue and enable comparisons of EI at different levels of [[Granularity]], the authors proposed the concept of [[Dimension-averaged EI]]. They found that the [[Measure of Causal Emergence]] defined by [[Dimension-averaged EI]] was solely dependent on the determinant of the [[Neural Network]]'s [[Jacobian Matrix]] and the variance of the random variables in the two compared dimensions, independent of other parameters such as L. Additionally, [[Dimension-averaged EI]] could be viewed as a [[Normalized EI]], or Eff.
   −
Essentially, EI is a quantity that depends only on the [[Dynamics]] of a [[Markov Dynamic System]]—specifically on the [[Markov State Transition Matrix]]<ref name=review>{{cite journal|last1=Yuan|first1=Bing|last2=Zhang|first2=Jiang|last3=Lyu|first3=Aobo|last4=Wu|first4=Jiaying|last5=Wang|first5=Zhipeng|last6=Yang|first6=Mingzhe|last7=Liu|first7=Kaiwei|last8=Mou|first8=Muyun|last9=Cui|first9=Peng|year=2024|title=Emergence and Causality in Complex Systems: A Survey of Causal Emergence and Related Quantitative Studies|journal=Entropy|volume=26|issue=2|page=108|url=https://doi.org/10.3390/e26020108}}</ref>. In their latest work on [[Dynamical Reversibility]] and [[Causal Emergence]], [[Zhang Jiang]] and colleagues pointed out that EI is actually a characterization of the reversibility of the underlying [[Markov Transition Matrix]], and they attempted to directly characterize the reversibility of Markov chain dynamics as a replacement for EI - <ref name=zhang_reversibility>{{cite journal|author1=Jiang Zhang|author2=Ruyi Tao|author3=Keng Hou Leong|author4=Mingzhe Yang|author5=Bing Yuan|year=2024|title=Dynamical reversibility and a new theory of causal emergence|url=https://arxiv.org/abs/2402.15054|journal=arXiv}}</ref>.
+
Essentially, EI is a quantity that depends only on the [[Dynamics]] of a [[Markov Dynamic System]]—specifically on the [[Markov State Transition Matrix]]<ref name=review>{{cite journal|last1=Yuan|first1=Bing|last2=Zhang|first2=Jiang|last3=Lyu|first3=Aobo|last4=Wu|first4=Jiaying|last5=Wang|first5=Zhipeng|last6=Yang|first6=Mingzhe|last7=Liu|first7=Kaiwei|last8=Mou|first8=Muyun|last9=Cui|first9=Peng|year=2024|title=Emergence and Causality in Complex Systems: A Survey of Causal Emergence and Related Quantitative Studies|journal=Entropy|volume=26|issue=2|page=108|url=https://doi.org/10.3390/e26020108}}</ref>. In their latest work on [[Dynamical Reversibility]] and [[Causal Emergence]], [[Zhang Jiang]] and colleagues pointed out that EI is actually a characterization of the reversibility of the underlying [[Markov Transition Matrix]], and they attempted to directly characterize the reversibility of Markov chain dynamics as a replacement for EI<ref name=zhang_reversibility>{{cite journal|author1=Jiang Zhang|author2=Ruyi Tao|author3=Keng Hou Leong|author4=Mingzhe Yang|author5=Bing Yuan|year=2024|title=Dynamical reversibility and a new theory of causal emergence|url=https://arxiv.org/abs/2402.15054|journal=arXiv}}</ref>.
      第75行: 第75行:  
The first transition probability matrix is a [[Permutation]] matrix and is reversible; thus, it has the highest determinism, no degeneracy, and therefore the highest EI. The second matrix's first three states transition to each other with a 1/3 probability, resulting in the lowest determinism but also low degeneracy, yielding an EI of 0.81. The third matrix, despite having binary transitions, has high degeneracy because all three states transition to state 1, meaning we cannot infer their previous state. Thus, its EI equals that of the second matrix at 0.81.
 
The first transition probability matrix is a [[Permutation]] matrix and is reversible; thus, it has the highest determinism, no degeneracy, and therefore the highest EI. The second matrix's first three states transition to each other with a 1/3 probability, resulting in the lowest determinism but also low degeneracy, yielding an EI of 0.81. The third matrix, despite having binary transitions, has high degeneracy because all three states transition to state 1, meaning we cannot infer their previous state. Thus, its EI equals that of the second matrix at 0.81.
   −
Although in the original literature <ref name=hoel_2013 />, EI was mostly applied to discrete-state Markov chains, [[Zhang Jiang]], [[Liu Kaiwei]], and [[Yang Mingzhe]] extended the definition to more general continuous-variable cases - <ref name=zhang_nis /><ref name=yang_nis+ /><ref name=liu_exact />. This extension builds on EI's original definition by intervening on the cause variable X as a uniform distribution over a sufficiently large bounded interval, [math][-\frac{L}{2}, \frac{L}{2}]^n[/math]. The causal mechanism is assumed to be a conditional probability that follows a Gaussian distribution with a mean function [math]f(x)[/math] and covariance matrix [math]\Sigma[/math]. Based on this, the EI between the causal variables is then measured. The causal mechanism here is determined by the mapping [math]f(x)[/math] and the covariance matrix, which together define the conditional probability [math]Pr(y|x)[/math].
+
Although in the original literature <ref name=hoel_2013 />, EI was mostly applied to discrete-state Markov chains, [[Zhang Jiang]], [[Liu Kaiwei]], and [[Yang Mingzhe]] extended the definition to more general continuous-variable cases<ref name=zhang_nis /><ref name=yang_nis+ /><ref name=liu_exact />. This extension builds on EI's original definition by intervening on the cause variable X as a uniform distribution over a sufficiently large bounded interval, [math][-\frac{L}{2}, \frac{L}{2}]^n[/math]. The causal mechanism is assumed to be a conditional probability that follows a Gaussian distribution with a mean function [math]f(x)[/math] and covariance matrix [math]\Sigma[/math]. Based on this, the EI between the causal variables is then measured. The causal mechanism here is determined by the mapping [math]f(x)[/math] and the covariance matrix, which together define the conditional probability [math]Pr(y|x)[/math].
    
More detailed explanations follow.
 
More detailed explanations follow.
第144行: 第144行:  
It is not difficult to see that the final equation tells us that EI is actually composed of two terms: the first term is the average of the negative entropy of each row of the causal mechanism matrix, and the second term is the entropy of the variable [math]Y[/math]. In the first term, the probability distribution [math]Pr(X=x)[/math] of [math]X[/math] acts as the weight when averaging the entropy of each row. Only when we set this weight to be the same value (i.e., intervene to make [math]X[/math] uniformly distributed) can we treat each row of the causal mechanism matrix equally.
 
It is not difficult to see that the final equation tells us that EI is actually composed of two terms: the first term is the average of the negative entropy of each row of the causal mechanism matrix, and the second term is the entropy of the variable [math]Y[/math]. In the first term, the probability distribution [math]Pr(X=x)[/math] of [math]X[/math] acts as the weight when averaging the entropy of each row. Only when we set this weight to be the same value (i.e., intervene to make [math]X[/math] uniformly distributed) can we treat each row of the causal mechanism matrix equally.
   −
If the distribution is not uniform, some rows will be assigned a larger weight, while others will be given a smaller weight. This weight represents a certain "bias," which prevents the EI from reflecting the natural properties of the causal mechanism.
+
If the distribution is not uniform, some rows will be assigned a larger weight, while others will be given a smaller weight. This weight represents a certain bias, which prevents the EI from reflecting the natural properties of the causal mechanism.
    
=Effective Information of Markov Chains=
 
=Effective Information of Markov Chains=
第337行: 第337行:  
|}
 
|}
   −
The first transition probability matrix is a permutation matrix, which is invertible. It has the highest determinism and no degeneracy, leading to the maximum EI. The second matrix has the first three states transitioning to one another with equal probability (1/3), resulting in the lowest determinism but non-degeneracy, with EI being 0.81. The third matrix is deterministic but since three of the states transition to the first state, it's impossible to infer from state 1 which previous state led to it. Therefore, it has high degeneracy, and its EI is also 0.81, the same as the second.
+
The first transition probability matrix is a [[Permutation Matrix]], which is invertible. It has the highest determinism and no degeneracy, leading to the maximum EI. The second matrix has the first three states transitioning to one another with equal probability (1/3), resulting in the lowest determinism but non-degeneracy, with EI being 0.81. The third matrix is deterministic but since three of the states transition to the first state, it's impossible to infer from state 1 which previous state led to it. Therefore, it has high degeneracy, and its EI is also 0.81, the same as the second.
    
===Normalized Determinism and Degeneracy===
 
===Normalized Determinism and Degeneracy===
第740行: 第740行:  
==Dimension-Averaged EI==
 
==Dimension-Averaged EI==
   −
In discrete-state systems, when comparing systems of different scales, we can compute either the direct EI difference or the normalized EI difference. Normalized EI is divided by [math]\log N[/math], where [math]N=#(\mathcal{X})[/math] represents the number of elements in the discrete state space [math]\mathcal{X}[/math].
+
In discrete-state systems, when comparing systems of different scales, we can compute either the direct EI difference or the normalized EI difference. Normalized EI is divided by [math]\log N[/math], where [math]N=\#(\mathcal{X})[/math] represents the number of elements in the discrete state space [math]\mathcal{X}[/math].
    
However, for continuous variables, if the original EI is used, an unreasonable result may occur. Firstly, as shown in equation {{EquationNote|6}}, the EI formula contains a term [math]\ln L^n[/math]. Since L is a large positive number, the EI result will be significantly affected by L. Secondly, when calculating normalized EI (Eff), the issue arises that for continuous variables, the number of elements in the state space is infinite. A potential solution is to treat the volume of the space as the number N, and thus normalize it by [math]n \ln L[/math], meaning it is proportional to n and ln L:
 
However, for continuous variables, if the original EI is used, an unreasonable result may occur. Firstly, as shown in equation {{EquationNote|6}}, the EI formula contains a term [math]\ln L^n[/math]. Since L is a large positive number, the EI result will be significantly affected by L. Secondly, when calculating normalized EI (Eff), the issue arises that for continuous variables, the number of elements in the state space is infinite. A potential solution is to treat the volume of the space as the number N, and thus normalize it by [math]n \ln L[/math], meaning it is proportional to n and ln L:
第936行: 第936行:  
The concept of Effective Information (EI) was first introduced in a paper <ref name="tononi_2003">{{cite journal |last1=Tononi|first1=G.|last2=Sporns|first2=O.|title=Measuring information integration|journal=BMC Neuroscience|volume=4 |issue=31 |year=2003|url=https://doi.org/10.1186/1471-2202-4-31}}</ref> by Tononi et al. (2003) In this article, the authors defined the indicator of the [[Integrated Information Ability]] and established the [[Integrated Information Theory (IIT)]], which later evolved into an important branch of consciousness theory. The definition of the indicator of the [[Integrated Information Ability]] is based on effective information.
 
The concept of Effective Information (EI) was first introduced in a paper <ref name="tononi_2003">{{cite journal |last1=Tononi|first1=G.|last2=Sporns|first2=O.|title=Measuring information integration|journal=BMC Neuroscience|volume=4 |issue=31 |year=2003|url=https://doi.org/10.1186/1471-2202-4-31}}</ref> by Tononi et al. (2003) In this article, the authors defined the indicator of the [[Integrated Information Ability]] and established the [[Integrated Information Theory (IIT)]], which later evolved into an important branch of consciousness theory. The definition of the indicator of the [[Integrated Information Ability]] is based on effective information.
 
===EI and Φ===
 
===EI and Φ===
The integrated information (or the degree of integration) <math>\Phi</math>, can be defined as the minimum value of EI between any two bipartitions of a system. Suppose the system is 𝑋, and 𝑆 is a subset of 𝑋, that is partitioned into two parts, 𝐴 and 𝐵. There are causal interactions between 𝐴, 𝐵, and the rest of 𝑋. [[文件:OriginalEI.png|350x350px|整合信息论中的划分|替代=|缩略图|链接=https://wiki.swarma.org/index.php/%E6%96%87%E4%BB%B6:OriginalEI.png]] In this scenario, we can measure the strength of these causal interactions. First, we calculate the EI from 𝐴 to 𝐵, i.e., we intervene on 𝐴 such that it follows the maximum entropy distribution, then measure the mutual information between 𝐴 and 𝐵:
+
The integrated information (or the degree of integration) <math>\Phi</math>, can be defined as the minimum value of EI between any two bipartitions of a system. Suppose the system is 𝑋, and 𝑆 is a subset of 𝑋, that is partitioned into two parts, 𝐴 and 𝐵. There are causal interactions between 𝐴, 𝐵, and the rest of 𝑋. [[文件:OriginalEI.png|350x350px|The Division in Integrated Information Theory|替代=|缩略图|链接=https://wiki.swarma.org/index.php/%E6%96%87%E4%BB%B6:OriginalEI.png]] In this scenario, we can measure the strength of these causal interactions. First, we calculate the EI from 𝐴 to 𝐵, i.e., we intervene on 𝐴 such that it follows the maximum entropy distribution, then measure the mutual information between 𝐴 and 𝐵:
    
<math>
 
<math>
2,365

个编辑

导航菜单