更改

添加11,228字节 、 2024年10月30日 (星期三)
无编辑摘要
第127行: 第127行:  
=====Boolean Network Example=====
 
=====Boolean Network Example=====
   −
Another example in the literature is an example of causal emergence in a Boolean network. As shown in the figure, this is a Boolean network with 4 nodes. Each node has two states, 0 and 1. Each node is connected to two other nodes and follows the same microscopic dynamics mechanism (figure a). Therefore, this system contains a total of sixteen microscopic states, and its dynamics can be represented by a state transition matrix (figure c).
+
Another example in the literature [1] is an example of causal emergence in a Boolean network. As shown in the figure, this is a Boolean network with 4 nodes. Each node has two states, 0 and 1. Each node is connected to two other nodes and follows the same microscopic dynamics mechanism (figure a). Therefore, this system contains a total of sixteen microscopic states, and its dynamics can be represented by a state transition matrix (figure c).
    
The coarse-graining operation of this system is divided into two steps. The first step is to cluster the nodes in the Boolean network. As shown in figure b below, merge A and B to obtain the macroscopic node [math]\alpha[/math], and merge C and D to obtain the macroscopic node [math]\beta[/math]. The second step is to map the microscopic node states in each group to the merged macroscopic node states. This mapping function is shown in figure d below. All microscopic node states containing 0 are transformed into the off state of the macroscopic node, while the microscopic 11 state is transformed into the on state of the macroscopic. In this way, we can obtain a new macroscopic Boolean network, and obtain the dynamic mechanism of the macroscopic Boolean network according to the dynamic mechanism of the microscopic nodes. According to this mechanism, the state transition matrix of the macroscopic network can be obtained (as shown in figure e).
 
The coarse-graining operation of this system is divided into two steps. The first step is to cluster the nodes in the Boolean network. As shown in figure b below, merge A and B to obtain the macroscopic node [math]\alpha[/math], and merge C and D to obtain the macroscopic node [math]\beta[/math]. The second step is to map the microscopic node states in each group to the merged macroscopic node states. This mapping function is shown in figure d below. All microscopic node states containing 0 are transformed into the off state of the macroscopic node, while the microscopic 11 state is transformed into the on state of the macroscopic. In this way, we can obtain a new macroscopic Boolean network, and obtain the dynamic mechanism of the macroscopic Boolean network according to the dynamic mechanism of the microscopic nodes. According to this mechanism, the state transition matrix of the macroscopic network can be obtained (as shown in figure e).
第136行: 第136行:  
=====Causal Emergence in Continuous Variables=====
 
=====Causal Emergence in Continuous Variables=====
   −
Furthermore, in the paper "xxx", Hoel et al. proposed the theoretical framework of causal geometry, trying to generalize the causal emergence theory to function mappings and dynamical systems with continuous states. This article defines <math>EI</math> for random function mapping, and also introduces the concepts of intervention noise and causal geometry, and compares and analogizes this concept with information geometry. Liu Kaiwei et al. further gave an exact analytical causal emergence theory for random iterative dynamical systems.
+
Furthermore, in the paper [42], Hoel et al. proposed the theoretical framework of causal geometry, trying to generalize the causal emergence theory to function mappings and dynamical systems with continuous states. This article defines <math>EI</math> for random function mapping, and also introduces the concepts of intervention noise and causal geometry, and compares and analogizes this concept with information geometry. Liu Kaiwei et al.[43] further gave an exact analytical causal emergence theory for random iterative dynamical systems.
    
====Rosas's Causal Emergence Theory====
 
====Rosas's Causal Emergence Theory====
   −
Rosas et al. [5] From the perspective of [[information decomposition]] theory, propose a method for defining causal emergence based on [[integrated information decomposition]], and further divide causal emergence into two parts: [[causal decoupling]] (Causal Decoupling) and [[downward causation]] (Downward Causation). Among them, causal decoupling represents the causal effect of the macroscopic state at the current moment on the macroscopic state at the next moment, and downward causation represents the causal effect of the macroscopic state at the previous moment on the microscopic state at the next moment. The schematic diagrams of causal decoupling and downward causation are shown in the following figure. The microscopic state input is <math>X_t\ (X_t^1,X_t^2,…,X_t^n ) </math>, and the macroscopic state is <math>V_t </math>, which is obtained by coarse-graining the microscopic state variable <math>X_t </math>, so it is a supervenient feature of <math>X_t </math>, <math>X_{t + 1} </math> and <math>V_{t + 1} </math> represent the microscopic and macroscopic states at the next moment respectively.
+
Rosas et al. [37] From the perspective of [[information decomposition]] theory, propose a method for defining causal emergence based on [[integrated information decomposition]], and further divide causal emergence into two parts: [[causal decoupling]] (Causal Decoupling) and [[downward causation]] (Downward Causation). Among them, causal decoupling represents the causal effect of the macroscopic state at the current moment on the macroscopic state at the next moment, and downward causation represents the causal effect of the macroscopic state at the previous moment on the microscopic state at the next moment. The schematic diagrams of causal decoupling and downward causation are shown in the following figure. The microscopic state input is <math>X_t\ (X_t^1,X_t^2,…,X_t^n ) </math>, and the macroscopic state is <math>V_t </math>, which is obtained by coarse-graining the microscopic state variable <math>X_t </math>, so it is a supervenient feature of <math>X_t </math>, <math>X_{t + 1} </math> and <math>V_{t + 1} </math> represent the microscopic and macroscopic states at the next moment respectively.
    
[[文件:向下因果与因果解耦2.png|居左|300x300像素|因果解耦与向下因果]]
 
[[文件:向下因果与因果解耦2.png|居左|300x300像素|因果解耦与向下因果]]
 +
 +
=====Partial Information Decomposition=====
 +
 +
This method is based on the nonnegative decomposition of multivariate information theory proposed by Williams and Beer et al [44]. This paper uses partial information decomposition (PID) to decompose the mutual information between microstates and macrostates.
 +
 +
Without loss of generality, assume that our microstate is <math>X(X^1,X^2)</math>, that is, it is a two-dimensional variable, and the macrostate is <math>V</math>. Then the mutual information between the two can be decomposed into four parts:
 +
 +
<math>I(X^1,X^2;V)=Red(X^1,X^2;V)+Un(X^1;V│X^2)+Un(X^2;V│X^1)+Syn(X^1,X^2;V)</math>
 +
 +
Among them, <math>Red(X^1,X^2;V)</math> represents redundant information, which refers to the information repeatedly provided by two microstates <math>X^1</math> and <math>X^2</math> to the macrostate <math>V</math>; <math>Un(X^1;V│X^2)</math> and <math>Un(X^2;V│X^1)</math> represent unique information, which refers to the information provided by each microstate variable alone to the macrostate; <math>Syn(X^1,X^2;V)</math> represents synergistic information, which refers to the information provided by all microstates <math>X</math> jointly to the macrostate <math>V</math>.
 +
 +
=====Definition of Causal Emergence=====
 +
 +
However, the PID framework can only decompose the mutual information between multiple source variables and one target variable. Rosas extended this framework and proposed the integrated information decomposition method <math>\Phi ID</math>[45] to handle the mutual information between multiple source variables and multiple target variables. It can also be used to decompose the mutual information between different moments. Based on the decomposed information, the author proposed two definition methods of causal emergence:
 +
 +
1) When the unique information <math>Un(V_t;X_{t + 1}|X_t^1,\ldots,X_t^n\)>0</math>, it means that the macroscopic state <math>V_t</math> at the current moment can provide more information to the overall system <math>X_{t + 1}</math> at the next moment than the microscopic state <math>X_t</math> at the current moment. At this time, there is causal emergence in the system;
 +
 +
2) The second method bypasses the selection of a specific macroscopic state <math>V_t</math>, and defines causal emergence only based on the synergistic information between the microscopic state <math>X_t</math> and the microscopic state <math>X_{t + 1}</math> at the next moment of the system. When the synergistic information <math>Syn(X_t^1,…,X_t^n;X_{t + 1}^1,…,X_{t + 1}^n)>0</math>, causal emergence occurs in the system.
 +
 +
It should be noted that for the first method to judge the occurrence of causal emergence, it depends on the selection of the macroscopic state <math>V_t</math>. The first method is the lower bound of the second method. This is because <math>Syn(X_t;X_{t + 1}\)≥Un(V_t;X_{t + 1}|X_t)</math> always holds. So, if <math>Un(V_t;X_{t + 1}|X_t)</math> is greater than 0, then causal emergence occurs in the system. However, the selection of <math>V_t</math> often requires predefining a coarse-graining function, so the limitations of the Erik Hoel causal emergence theory cannot be avoided. Another natural idea is to use the second method to judge the occurrence of causal emergence with the help of synergistic information. However, the calculation of synergistic information is very difficult and there is a combinatorial explosion problem. Therefore, the calculation based on synergistic information in the second method is often infeasible. In short, both quantitative characterization methods of causal emergence have some weaknesses, so a more reasonable quantification method needs to be proposed.
 +
 +
=====Specific Example=====
 +
 +
[[文件:因果解耦以及向下因果例子1.png|500x500像素|居左|因果解耦以及向下因果例子]]
 +
 +
The author of the paper [37] lists a specific example (as above), to illustrate when causal decoupling, downward causation and causal emergence occur. This example is a special Markov process. Here, <math>p_{X_{t + 1}|X_t}(x_{t + 1}|x_t)</math> represents the dynamic relationship, and <math>X_t=(x_t^1,…,x_t^n)\in\{0,1\}^n</math> is the microstate. The definition of this process is to determine the probability of taking different values of the state <math>x_{t + 1}</math> at the next moment by checking the values of the variables <math>x_t</math> and <math>x_{t + 1}</math> at two consecutive moments, that is, judging whether the sum modulo 2 of all dimensions of <math>x_t</math> is the same as the first dimension of <math>x_{t + 1}</math>: if they are different, the probability is 0; otherwise, judge whether <math>x_t,x_{t + 1}</math> have the same sum modulo 2 value in all dimensions. If both conditions are satisfied, the value probability is <math>\gamma/2^{n - 2}</math>, otherwise the value probability is <math>(1-\gamma)/2^{n - 2}</math>. Here <math>\gamma</math> is a parameter and <math>n</math> is the total dimension of x.
 +
 +
In fact, if <math>\sum_{j = 1}^n x^j_t</math> is even or 0, then <math>\oplus^n_{j = 1} x^j_t:=1</math>, otherwise <math>\oplus^n_{j = 1} x^j_t:=0</math>. Therefore, the result of <math>\oplus^n_{j = 1} x^j_t</math> is the parity of the entire X sequence, and the first dimension can be regarded as a parity check bit. <math>\gamma</math> actually represents the probability that a mutation occurs in two bits of the X sequence, and this mutation can ensure that the parity of the entire sequence remains unchanged, and the parity check bit of the sequence also conforms to the actual parity of the entire sequence.
 +
 +
Therefore, the macroscopic state of this process can be regarded as the parity of the sum of all dimensions of the entire sequence, and the probability distribution of this parity is the result of the exclusive OR calculation of the microstate. <math>x_{t + 1}^1</math> is a special microstate that always remains consistent with the macroscopic state of the sequence at the previous moment. Therefore, when only the first item in the second judgment condition is satisfied, the downward causation condition of the system occurs. When only the second item is satisfied, the causal decoupling of the system occurs. When both items are satisfied simultaneously, it is said that causal emergence occurs in the system.
 +
 +
====Causal Emergence Theory Based on Singular Value Decomposition====
 +
 +
Erik Hoel's causal emergence theory has the problem of needing to specify a coarse-graining strategy in advance. Rosas' information decomposition theory does not completely solve this problem. Therefore, Zhang Jiang et al.[26] further proposed the causal emergence theory based on singular value decomposition.
 +
 +
=====Singular Value Decomposition of Markov Chains=====
 +
 +
Given the Markov transition matrix <math>P</math> of a system, we can perform singular value decomposition on it to obtain two orthogonal and normalized matrices <math>U</math> and <math>V</math>, and a diagonal matrix <math>\Sigma</math>: <math>P = U\Sigma V^T</math>, where [math]\Sigma = diag(\sigma_1,\sigma_2,\cdots,\sigma_N)[/math], where [math]\sigma_1\geq\sigma_2\geq\cdots\sigma_N[/math] are the singular values of <math>P</math> and are arranged in descending order. <math>N</math> is the number of states of <math>P</math>.
 +
 +
=====Approximate Dynamical Reversibility and Effective Information=====
 +
 +
We can define the sum of the <math>\alpha</math> powers of the singular values (also known as the [math]\alpha[/math]-order Schatten norm of the matrix) as a measure of the approximate dynamical reversibility of the Markov chain, that is:
 +
 +
<math>
 +
\Gamma_{\alpha}\equiv \sum_{i = 1}^N\sigma_i^{\alpha}
 +
</math>
 +
 +
Here, [math]\alpha\in(0,2)[/math] is a specified parameter that acts as a weight or tendency to make [math]\Gamma_{\alpha}[/math] reflect determinism or degeneracy more. Under normal circumstances, we take [math]\alpha = 1[/math>, which can make [math]\Gamma_{\alpha}[/math> achieve a balance between determinism and degeneracy.
 +
 +
In addition, the authors in the literature prove that there is an approximate relationship between <math>EI</math> and [math]\Gamma_{\alpha}[/math>:
 +
 +
<math>
 +
EI\sim \log\Gamma_{\alpha}
 +
</math>
 +
 +
Moreover, to a certain extent, [math]\Gamma_{\alpha}[/math> can be used instead of EI to measure the degree of causal effect of Markov chains. Therefore, the so-called causal emergence can also be understood as an '''emergence of dynamical reversibility'''.
 +
 +
=====Quantification of Causal Emergence without Coarse-graining=====
 +
 +
However, the greatest value of this theory lies in the fact that emergence can be directly quantified without a coarse-graining strategy. If the rank of <math>P</math> is <math>r</math>, that is, starting from the <math>r + 1</math>th singular value, all singular values are 0, then we say that the dynamics <math>P</math> has '''clear causal emergence''', and the numerical value of causal emergence is:
 +
 +
<math>
 +
\Delta \Gamma_{\alpha} =  \Gamma_{\alpha}(1/r - 1/N)
 +
</math>
 +
 +
If the matrix <math>P</math> is full rank, but for any given small number <math>\epsilon</math>, there exists <math>r_{\epsilon}</math> such that starting from <math>r_{\epsilon}+1</math>, all singular values are less than <math>\epsilon</math>, then it is said that the system has a degree of '''vague causal emergence''', and the numerical value of causal emergence is:
 +
 +
<math>\Delta \Gamma_{\alpha}(\epsilon) =  \frac{\sum_{i = 1}^{r} \sigma_{i}^{\alpha}}{r} -  \frac{\sum_{i = 1}^{N} \sigma_{i}^{\alpha}}{N} </math>
 +
 +
In summary, the advantage of this method for quantifying causal emergence is that it can quantify causal emergence more objectively without relying on a specific coarse-graining strategy. The disadvantage of this method is that to calculate [math]\Gamma_{\alpha}[/math], it is necessary to perform SVD decomposition on <math>P</math> in advance, so the computational complexity is [math]O(N^3)[/math], which is higher than the computational complexity of <math>EI</math>. Moreover, [math]\Gamma_{\alpha}[/math> cannot be explicitly decomposed into two components: determinism and degeneracy.
 +
 +
=====Specific Example=====
 +
 +
[[文件:Gamma例子.png|居左|500x500像素|<math>EI</math>与<math>\Gamma</math>对比]]
 +
 +
The author gives four specific examples of Markov chains. The state transition matrix of this Markov chain is shown in the figure. We can compare the <math>EI</math> and approximate dynamical reversibility (the <math>\Gamma</math> in the figure, that is, <math>\Gamma_{\alpha = 1}</math>) of this Markov chain. Comparing figures a and b, we find that for different state transition matrices, when <math>EI</math> decreases, <math>\Gamma</math> also decreases simultaneously. Further, figures c and d are comparisons of the effects before and after coarse-graining. Among them, figure d is the coarse-graining of the state transition matrix of figure c (merging the first three states into a macroscopic state). Since the macroscopic state transition matrix in figure d is a deterministic system, the normalized <math>EI</math>, <math>eff\equiv EI/\log N</math> and the normalized [math]\Gamma[/math>: <math>\gamma\equiv \Gamma/N</math> all reach the maximum value of 1.
150

个编辑