更改

跳到导航 跳到搜索
无编辑摘要
第10行: 第10行:  
In 2013, [[Giulio Tononi]]'s student, [[Erik Hoel]], further refined the concept of EI to quantitatively characterize emergence, leading to the development of the theory of [[Causal Emergence]]<ref name=hoel_2013>{{cite journal|last1=Hoel|first1=Erik P.|last2=Albantakis|first2=L.|last3=Tononi|first3=G.|title=Quantifying causal emergence shows that macro can beat micro|journal=Proceedings of the National Academy of Sciences|volume=110|issue=49|page=19790–19795|year=2013|url=https://doi.org/10.1073/pnas.1314922110}}</ref>. In this theory, Hoel used [[Judea Pearl]]'s [[do operator]] to modify the general [[Mutual Information]] metric <ref name="pearl_causality">{{cite book|title=因果论——模型、推理和推断|author1=Judea Pearl|author2=刘礼|author3=杨矫云|author4=廖军|author5=李廉|publisher=机械工业出版社|year=2022|month=4}}</ref>, which made EI fundamentally different from [[Mutual Information]]. While [[Mutual Information]] measures correlation, EI—due to the use of the [[do operator]]—measures causality. The article also introduced a [[Normalized Version of EI]], referred to as Eff.
 
In 2013, [[Giulio Tononi]]'s student, [[Erik Hoel]], further refined the concept of EI to quantitatively characterize emergence, leading to the development of the theory of [[Causal Emergence]]<ref name=hoel_2013>{{cite journal|last1=Hoel|first1=Erik P.|last2=Albantakis|first2=L.|last3=Tononi|first3=G.|title=Quantifying causal emergence shows that macro can beat micro|journal=Proceedings of the National Academy of Sciences|volume=110|issue=49|page=19790–19795|year=2013|url=https://doi.org/10.1073/pnas.1314922110}}</ref>. In this theory, Hoel used [[Judea Pearl]]'s [[do operator]] to modify the general [[Mutual Information]] metric <ref name="pearl_causality">{{cite book|title=因果论——模型、推理和推断|author1=Judea Pearl|author2=刘礼|author3=杨矫云|author4=廖军|author5=李廉|publisher=机械工业出版社|year=2022|month=4}}</ref>, which made EI fundamentally different from [[Mutual Information]]. While [[Mutual Information]] measures correlation, EI—due to the use of the [[do operator]]—measures causality. The article also introduced a [[Normalized Version of EI]], referred to as Eff.
    +
Traditionally, EI was primarily applied to discrete-state [[Markov Chains]]. To extend this to continuous domains, P. Chvykov and E. Hoel collaborated in 2020 to propose the theory of [[Causal Geometry]] <ref  name=Chvykov_causal_geometry>{{cite journal|author1=Chvykov P|author2=Hoel E.|title=Causal Geometry|journal=Entropy|year=2021|volume=23|issue=1|page=24|url=https://doi.org/10.3390/e2}}</ref>, expanding EI's definition to function mappings with continuous state variables. By incorporating [[Information Geometry]], they explored a perturbative form of EI and compared it with [[Fisher Information]], proposing the concept of [[Causal Geometry]]. However, this method of calculating EI for continuous variables required the assumption of infinitesimal variance for normal distribution variables, which was an overly stringent condition.
   −
Traditionally, EI was primarily applied to discrete-state Markov chains. To extend this to continuous domains, P. Chvykov and E. Hoel collaborated in 2020 to propose the theory of Causal Geometry, expanding EI's definition to function mappings with continuous state variables. By incorporating Information Geometry, they explored a perturbative form of EI and compared it with Fisher Information, proposing the concept of Causal Geometry. However, this method of calculating EI for continuous variables required the assumption of infinitesimal variance for normal distribution variables, which was an overly stringent condition.
+
In 2022, to address the calculation of EI in general [[Feedforward Neural Networks]], [[Zhang Jiang]] and [[Liu Kaiwei]] removed the variance constraint from the [[Causal Geometry]] approach and explored a more general form of EI, <ref  name=zhang_nis>{{cite journal|title=Neural Information Squeezer for Causal Emergence|first1=Jiang|last1=Zhang|first2=Kaiwei|last2=Liu|journal=Entropy|year=2022|volume=25|issue=1|page=26|url=https://api.semanticscholar.org/CorpusID:246275672}}</ref><ref name=yang_nis+>{{cite journal|title=Finding emergence in data by maximizing effective information|author1=Mingzhe Yang|author2=Zhipeng Wang|author3=Kaiwei Liu|author4=Yingqi Rong|author5=Bing Yuan|author6=Jiang Zhang|journal=arXiv|page=2308.09952|year=2024}}</ref><ref name=liu_exact>{{cite journal|title=An Exact Theory of Causal Emergence for Stochastic Iterative Systems|author1=Kaiwei Liu|author2=Bing Yuan|author3=Jiang Zhang|journal=arXiv|page=2405.09207|year=2024}}</ref>. Nonetheless, a limitation remained: because the uniform distribution of variables in the real-number domain is strictly defined over an infinite space, the calculation of EI involved a parameter [math]L[/math], representing the range of the uniform distribution. To avoid this issue and enable comparisons of EI at different levels of [[Granularity]], the authors proposed the concept of [[Dimension-averaged EI]]. They found that the [[Measure of Causal Emergence]] defined by [[Dimension-averaged EI]] was solely dependent on the determinant of the [[Neural Network]]'s [[Jacobian Matrix]] and the variance of the random variables in the two compared dimensions, independent of other parameters such as L. Additionally, [[Dimension-averaged EI]] could be viewed as a [[Normalized EI]], or Eff.
   −
In 2022, to address the calculation of EI in general feedforward neural networks, Zhang Jiang and Liu Kaiwei removed the variance constraint from the Causal Geometry approach and explored a more general form of EI. Nonetheless, a limitation remained: because the uniform distribution of variables in the real-number domain is strictly defined over an infinite space, the calculation of EI involved a parameter [math]L[/math], representing the range of the uniform distribution. To avoid this issue and enable comparisons of EI at different levels of granularity, the authors proposed the concept of dimension-averaged EI. They found that the measure of causal emergence defined by dimension-averaged EI was solely dependent on the determinant of the neural network's Jacobian matrix and the variance of the random variables in the two compared dimensions, independent of other parameters such as [math]L[/math]. Additionally, dimension-averaged EI could be viewed as a normalized EI, or Eff.
+
Essentially, EI is a quantity that depends only on the [[Dynamics]] of a [[Markov Dynamic System]]—specifically on the [[Markov State Transition Matrix]]—<ref name=review>{{cite journal|last1=Yuan|first1=Bing|last2=Zhang|first2=Jiang|last3=Lyu|first3=Aobo|last4=Wu|first4=Jiaying|last5=Wang|first5=Zhipeng|last6=Yang|first6=Mingzhe|last7=Liu|first7=Kaiwei|last8=Mou|first8=Muyun|last9=Cui|first9=Peng|year=2024|title=Emergence and Causality in Complex Systems: A Survey of Causal Emergence and Related Quantitative Studies|journal=Entropy|volume=26|issue=2|page=108|url=https://doi.org/10.3390/e26020108}}</ref>. In their latest work on [[Dynamical Reversibility]] and [[Causal Emergence]], [[Zhang Jiang]] and colleagues pointed out that EI is actually a characterization of the reversibility of the underlying [[Markov Transition Matrix]], and they attempted to directly characterize the reversibility of Markov chain dynamics as a replacement for EI - <ref name=zhang_reversibility>{{cite journal|author1=Jiang Zhang|author2=Ruyi Tao|author3=Keng Hou Leong|author4=Mingzhe Yang|author5=Bing Yuan|year=2024|title=Dynamical reversibility and a new theory of causal emergence|url=https://arxiv.org/abs/2402.15054|journal=arXiv}}</ref>.
   −
Essentially, EI is a quantity that depends only on the dynamics of a Markov system—specifically on the Markov state transition matrix—and is independent of the distribution of state variables. However, this point was not previously highlighted. In a 2024 review by Yuan Bing and others, the authors emphasized this fact and provided an explicit form of EI that depends only on the Markov state transition matrix. In their latest work on dynamical reversibility and causal emergence, Zhang Jiang and colleagues pointed out that EI is actually a characterization of the reversibility of the underlying Markov transition matrix, and they attempted to directly characterize the reversibility of Markov chain dynamics as a replacement for EI.
      
=Overview =
 
=Overview =
The EI metric is primarily used to measure the strength of causal effects in Markov dynamics. Unlike general causal inference theories, EI is used in cases where the dynamics (the Markov transition probability matrix) are known and no unknown variables (i.e., confounders) are present. Its core objective is to measure the strength of causal connections, rather than the existence of causal effects. This means EI is more suitable for scenarios where a causal relationship between variables X and Y is already established.
+
The EI metric is primarily used to measure the strength of causal effects in Markov dynamics. Unlike general causal inference theories, EI is used in cases where the dynamics (the Markov transition probability matrix) are known and no unknown variables (i.e., [[Confounders]]) are present. Its core objective is to measure the strength of causal connections, rather than the existence of causal effects. This means EI is more suitable for scenarios where a causal relationship between variables X and Y is already established.
   −
Formally, EI is a function of the causal mechanism (in a discrete-state Markov chain, this is the probability transition matrix of the Markov chain) and is independent of other factors. The formal definition of EI is:
+
Formally, EI is a function of the causal mechanism (in a discrete-state [[Markov Chain]], this is the [[Probability Transition Matrix]] of the Markov chain) and is independent of other factors. The formal definition of EI is:
    
<math>
 
<math>
第26行: 第26行:  
</math>
 
</math>
   −
where P represents the causal mechanism from X to Y, which is a probability transition matrix, [math]p_{ij}\equiv Pr(Y=j|X=i)[/math]; X is the cause variable, Y is the effect variable, and [math]do(X\sim U)[/math] denotes the intervention on X, changing its distribution to a uniform one. Under this intervention, and assuming the causal mechanism P remains unchanged, Y will be indirectly affected by the intervention on X. EI measures the mutual information between X and Y after this intervention.
+
where P represents the causal mechanism from X to Y, which is a probability transition matrix, [math]p_{ij}\equiv Pr(Y=j|X=i)[/math]; X is the cause variable, Y is the effect variable, and [math]do(X\sim U)[/math] denotes the [[do Intervention]] on X, changing its distribution to a uniform one. Under this intervention, and assuming the causal mechanism P remains unchanged, Y will be indirectly affected by the intervention on X. EI measures the mutual information between X and Y after this intervention.
   −
The introduction of the "do" operator aims to eliminate the influence of X's distribution on EI, ensuring that the final EI metric is only a function of the causal mechanism f and is independent of X's distribution.
+
The introduction of the do operator aims to eliminate the influence of X's distribution on EI, ensuring that the final EI metric is only a function of the causal mechanism f and is independent of X's distribution.
    
Below are three examples of Markov chains, with their respective EI values included:
 
Below are three examples of Markov chains, with their respective EI values included:
第65行: 第65行:  
|}{{NumBlk|:||{{EquationRef|example}}}}
 
|}{{NumBlk|:||{{EquationRef|example}}}}
   −
 
+
As we can see, the EI of the first matrix [math]P_1[/math] is higher than that of the second [math]P_2[/math] because this probability transition is fully deterministic: starting from a particular state, it transitions to another state with 100% probability. However, not all deterministic matrices correspond to high EI, such as matrix [math]P_3[/math]. Although its transition probabilities are also either 100% or 0, because all of the last three states transition to the first state, we cannot distinguish which state it was in the previous moment. Therefore, its EI is low, which we call '''Degeneracy'''. Hence, if a transition matrix has high determinism and low degeneracy, its EI will be high. Additionally, EI can be decomposed as follows:
 
  −
As we can see, the EI of the first matrix [math]P_1[/math] is higher than that of the second [math]P_2[/math] because this probability transition is fully deterministic: starting from a particular state, it transitions to another state with 100% probability. However, not all deterministic matrices correspond to high EI, such as matrix [math]P_3[/math]. Although its transition probabilities are also either 100% or 0, because all of the last three states transition to the first state, we cannot distinguish which state it was in the previous moment. Therefore, its EI is low, which we call degeneracy. Hence, if a transition matrix has high determinism and low degeneracy, its EI will be high. Additionally, EI can be decomposed as follows:
      
<math>
 
<math>
第75行: 第73行:  
Where Det stands for Determinism, and Deg stands for Degeneracy. EI is the difference between the two. In the table, we also list the values of Det and Deg corresponding to the matrices.
 
Where Det stands for Determinism, and Deg stands for Degeneracy. EI is the difference between the two. In the table, we also list the values of Det and Deg corresponding to the matrices.
   −
The first transition probability matrix is a permutation matrix and is reversible; thus, it has the highest determinism, no degeneracy, and therefore the highest EI. The second matrix's first three states transition to each other with a 1/3 probability, resulting in the lowest determinism but also low degeneracy, yielding an EI of 0.81. The third matrix, despite having binary transitions, has high degeneracy because all three states transition to state 1, meaning we cannot infer their previous state. Thus, its EI equals that of the second matrix at 0.81.
+
The first transition probability matrix is a [[Permutation]] matrix and is reversible; thus, it has the highest determinism, no degeneracy, and therefore the highest EI. The second matrix's first three states transition to each other with a 1/3 probability, resulting in the lowest determinism but also low degeneracy, yielding an EI of 0.81. The third matrix, despite having binary transitions, has high degeneracy because all three states transition to state 1, meaning we cannot infer their previous state. Thus, its EI equals that of the second matrix at 0.81.
   −
Although EI was originally applied to discrete-state Markov chains, Zhang Jiang, Liu Kaiwei, and Yang Mingzhe extended the definition to more general continuous-variable cases. This extension builds on EI's original definition by intervening on the cause variable X as a uniform distribution over a sufficiently large bounded interval, [math][-\frac{L}{2}, \frac{L}{2}]^n[/math]. The causal mechanism is assumed to be a conditional probability that follows a Gaussian distribution with a mean function [math]f(x)[/math] and covariance matrix [math]\Sigma[/math]. Based on this, the EI between the causal variables is then measured. The causal mechanism here is determined by the mapping [math]f(x)[/math] and the covariance matrix, which together define the conditional probability [math]Pr(y|x)[/math].
+
Although in the original literature <ref name=hoel_2013 />, EI was mostly applied to discrete-state Markov chains, [[Zhang Jiang]], [[Liu Kaiwei]], and [[Yang Mingzhe]] extended the definition to more general continuous-variable cases - <ref name=zhang_nis /><ref name=yang_nis+ /><ref name=liu_exact />. This extension builds on EI's original definition by intervening on the cause variable X as a uniform distribution over a sufficiently large bounded interval, [math][-\frac{L}{2}, \frac{L}{2}]^n[/math]. The causal mechanism is assumed to be a conditional probability that follows a Gaussian distribution with a mean function [math]f(x)[/math] and covariance matrix [math]\Sigma[/math]. Based on this, the EI between the causal variables is then measured. The causal mechanism here is determined by the mapping [math]f(x)[/math] and the covariance matrix, which together define the conditional probability [math]Pr(y|x)[/math].
    
More detailed explanations follow.
 
More detailed explanations follow.
1,117

个编辑

导航菜单