第103行: |
第103行: |
| == Quantification of causal emergence == | | == Quantification of causal emergence == |
| Next, we will focus on introducing several studies that use causal measures to quantify emergence phenomena. | | Next, we will focus on introducing several studies that use causal measures to quantify emergence phenomena. |
| + | |
| | | |
| === Several causal emergence theories === | | === Several causal emergence theories === |
− | How to define causal emergence is a key issue. There are several representative works, namely the method based on effective information proposed by Hoel et al. [1][2], the method based on information decomposition proposed by Rosas et al. [37], a new causal emergence theory based on singular value decomposition proposed by Zhang Jiang et al. [26], and some other theories. | + | How to define causal emergence is a key issue. There are several representative works, namely the method based on effective information proposed by Hoel et al. <ref name=":0" /><ref name=":1" />, the method based on information decomposition proposed by Rosas et al. <ref name=":5">Rosas F E, Mediano P A, Jensen H J, et al. Reconciling emergences: An information-theoretic approach to identify causal emergence in multivariate data[J]. PLoS computational biology, 2020, 16(12): e1008289.</ref>, a new causal emergence theory based on singular value decomposition proposed by Zhang Jiang et al. <ref name=":2" />, and some other theories. |
| + | |
| | | |
| ==== Erik Hoel's causal emergence theory ==== | | ==== Erik Hoel's causal emergence theory ==== |
− | In 2013, Hoel et al. [1][2] proposed the causal emergence theory. The following figure is an abstract framework for this theory. The horizontal axis represents time and the vertical axis represents scale. This framework can be regarded as a description of the same dynamical system on both microscopic and macroscopic scales. Among them, [math]f_m[/math] represents microscopic dynamics, [math]f_M[/math] represents macroscopic dynamics, and the two are connected by a coarse-graining function [math]\phi[/math]. In a discrete-state Markov dynamical system, both [math]f_m[/math] and [math]f_M[/math] are Markov chains. By performing coarse-graining of the Markov chain on [math]f_m[/math], [math]f_M[/math] can be obtained. <math> EI </math> is a measure of effective information. Since the microscopic state may have greater randomness, which leads to relatively weak causality of microscopic dynamics, by performing reasonable coarse-graining on the microscopic state at each moment, it is possible to obtain a macroscopic state with stronger causality. The so-called causal emergence refers to the phenomenon that when we perform coarse-graining on the microscopic state, the effective information of macroscopic dynamics will increase, and the difference in effective information between the macroscopic state and the microscopic state is defined as the intensity of causal emergence. | + | In 2013, Hoel et al. <ref name=":0" /><ref name=":1" /> proposed the causal emergence theory. The following figure is an abstract framework for this theory. The horizontal axis represents time and the vertical axis represents scale. This framework can be regarded as a description of the same dynamical system on both microscopic and macroscopic scales. Among them, [math]f_m[/math] represents microscopic dynamics, [math]f_M[/math] represents macroscopic dynamics, and the two are connected by a coarse-graining function [math]\phi[/math]. In a discrete-state Markov dynamical system, both [math]f_m[/math] and [math]f_M[/math] are Markov chains. By performing coarse-graining of the Markov chain on [math]f_m[/math], [math]f_M[/math] can be obtained. <math> EI </math> is a measure of effective information. Since the microscopic state may have greater randomness, which leads to relatively weak causality of microscopic dynamics, by performing reasonable coarse-graining on the microscopic state at each moment, it is possible to obtain a macroscopic state with stronger causality. The so-called causal emergence refers to the phenomenon that when we perform coarse-graining on the microscopic state, the effective information of macroscopic dynamics will increase, and the difference in effective information between the macroscopic state and the microscopic state is defined as the intensity of causal emergence. |
− | [[文件:因果涌现理论框架.png|无|缩略图]] | + | |
| + | [[文件:因果涌现理论.png|因果涌现理论框架|alt=因果涌现理论抽象框架|居左|400x400像素]] |
| | | |
| ===== Effective Information ===== | | ===== Effective Information ===== |
− | Effective Information (<math> EI </math>) was first proposed by Tononi et al. in the study of integrated information theory [41]. In causal emergence research, Erik Hoel and others use this causal effect measure index to quantify the strength of causality of a causal mechanism. | + | Effective Information (<math> EI </math>) was first proposed by Tononi et al. in the study of integrated information theory <ref>Tononi G, Sporns O. Measuring information integration[J]. BMC neuroscience, 2003, 41-20.</ref>. In causal emergence research, Erik Hoel and others use this causal effect measure index to quantify the strength of causality of a causal mechanism. |
| + | |
| | | |
| Specifically, the calculation of <math> EI </math> is as follows: use an intervention operation to intervene on the independent variable and examine the mutual information between the cause and effect variables under this intervention. This mutual information is effective information, that is, the causal effect measure of the causal mechanism. | | Specifically, the calculation of <math> EI </math> is as follows: use an intervention operation to intervene on the independent variable and examine the mutual information between the cause and effect variables under this intervention. This mutual information is effective information, that is, the causal effect measure of the causal mechanism. |
| + | |
| | | |
| In a Markov chain, the state variable [math]X_t[/math] at any time can be regarded as the cause, and the state variable [math]X_{t + 1}[/math] at the next time can be regarded as the result. Thus, the state transition matrix of the Markov chain is its causal mechanism. Therefore, the calculation formula for <math>EI</math> for a Markov chain is as follows: | | In a Markov chain, the state variable [math]X_t[/math] at any time can be regarded as the cause, and the state variable [math]X_{t + 1}[/math] at the next time can be regarded as the result. Thus, the state transition matrix of the Markov chain is its causal mechanism. Therefore, the calculation formula for <math>EI</math> for a Markov chain is as follows: |
第127行: |
第132行: |
| Here <math>f</math> represents the state transition matrix of a Markov chain, [math]U(\mathcal{X})[/math] represents the uniform distribution on the value space [math]\mathcal{X}[/math] of the state variable [math]X_t[/math]. <math>\tilde{X}_t,\tilde{X}_{t+1}</math> are the states at two consecutive moments after intervening [math]X_t[/math] at time <math>t</math> into a uniform distribution. <math>p_{ij}</math> is the transition probability from the <math>i</math>-th state to the <math>j</math>-th state. From this formula, it is not difficult to see that <math> EI </math> is only a function of the probability transition matrix [math]f[/math]. The intervention operation is performed to make the effective information objectively measure the causal characteristics of the dynamics without being affected by the distribution of the original input data. | | Here <math>f</math> represents the state transition matrix of a Markov chain, [math]U(\mathcal{X})[/math] represents the uniform distribution on the value space [math]\mathcal{X}[/math] of the state variable [math]X_t[/math]. <math>\tilde{X}_t,\tilde{X}_{t+1}</math> are the states at two consecutive moments after intervening [math]X_t[/math] at time <math>t</math> into a uniform distribution. <math>p_{ij}</math> is the transition probability from the <math>i</math>-th state to the <math>j</math>-th state. From this formula, it is not difficult to see that <math> EI </math> is only a function of the probability transition matrix [math]f[/math]. The intervention operation is performed to make the effective information objectively measure the causal characteristics of the dynamics without being affected by the distribution of the original input data. |
| | | |
− | Effective information can be decomposed into two parts: determinism and degeneracy. Normalization can also be introduced to eliminate the influence of the size of the state space. For more detailed information about effective information, please refer to the entry: Effective Information. | + | |
| + | Effective information can be decomposed into two parts: '''determinism''' and '''degeneracy'''. Normalization can also be introduced to eliminate the influence of the size of the state space. For more detailed information about effective information, please refer to the entry: Effective Information. |
| + | |
| | | |
| =====Causal Emergence Measurement===== | | =====Causal Emergence Measurement===== |
第137行: |
第144行: |
| | | |
| Here <math>CE</math> is the causal emergence intensity. If the effective information of macroscopic dynamics is greater than that of microscopic dynamics (that is, <math>CE>0</math>), then we consider that macroscopic dynamics has causal emergence characteristics on the basis of this coarse-graining. | | Here <math>CE</math> is the causal emergence intensity. If the effective information of macroscopic dynamics is greater than that of microscopic dynamics (that is, <math>CE>0</math>), then we consider that macroscopic dynamics has causal emergence characteristics on the basis of this coarse-graining. |
| + | |
| | | |
| =====Markov Chain Example===== | | =====Markov Chain Example===== |
− | In the literature [1], Hoel gives an example of a state transition matrix ([math]f_m[/math]) of a Markov chain with 8 states, as shown in the left figure below. Among them, the first 7 states transfer with equal probability, and the last state is independent and can only transition to its own state. | + | In the literature <ref name=":0"/>, Hoel gives an example of a state transition matrix ([math]f_m[/math]) of a Markov chain with 8 states, as shown in the left figure below. Among them, the first 7 states transfer with equal probability, and the last state is independent and can only transition to its own state. |
| + | |
| | | |
| The coarse-graining of this matrix is as follows: First, merge the first 7 states into a macroscopic state, which may be called A. And sum up the probability values in the first 7 columns of the first 7 rows in [math]f_m[/math] to obtain the probability of state transition from macroscopic state A to state A, and keep other values of the [math]f_m[/math] matrix unchanged. The new probability transition matrix after merging is shown in the right figure, denoted as [math]f_M[/math]. This is a definite macroscopic Markov transition matrix, that is, the future state of the system can be completely determined by the current state. At this time <math>EI(f_M)>EI(f_m)</math>, and causal emergence occurs in the system. | | The coarse-graining of this matrix is as follows: First, merge the first 7 states into a macroscopic state, which may be called A. And sum up the probability values in the first 7 columns of the first 7 rows in [math]f_m[/math] to obtain the probability of state transition from macroscopic state A to state A, and keep other values of the [math]f_m[/math] matrix unchanged. The new probability transition matrix after merging is shown in the right figure, denoted as [math]f_M[/math]. This is a definite macroscopic Markov transition matrix, that is, the future state of the system can be completely determined by the current state. At this time <math>EI(f_M)>EI(f_m)</math>, and causal emergence occurs in the system. |
− | [[文件:状态空间中的因果涌现1.png|无|缩略图]] | + | |
| + | [[文件:状态空间中的因果涌现1.png|居左|500x500像素|状态空间上的因果涌现|替代=]] |
| + | |
| + | However, for more general Markov chains and more general state groupings, this simple operation of averaging probabilities is not always feasible. This is because the merged probability transition matrix may not satisfy the conditions of a Markov chain (such as the rows of the matrix not satisfying the normalization condition, or the element values exceeding the range of [0, 1]). For what kind of Markov chains and state groupings can a feasible macroscopic Markov chain be obtained, please refer to the section “Reduction of Markov Chains” later in this entry, or see the entry “Coarse-graining of Markov Chains”. |
| + | |
| | | |
| =====Boolean Network Example===== | | =====Boolean Network Example===== |
| + | Another example in the literature <ref name=":0"/> is an example of causal emergence in a Boolean network. As shown in the figure, this is a Boolean network with 4 nodes. Each node has two states, 0 and 1. Each node is connected to two other nodes and follows the same microscopic dynamics mechanism (figure a). Therefore, this system contains a total of sixteen microscopic states, and its dynamics can be represented by a state transition matrix (figure c). |
| | | |
− | Another example in the literature [1] is an example of causal emergence in a Boolean network. As shown in the figure, this is a Boolean network with 4 nodes. Each node has two states, 0 and 1. Each node is connected to two other nodes and follows the same microscopic dynamics mechanism (figure a). Therefore, this system contains a total of sixteen microscopic states, and its dynamics can be represented by a state transition matrix (figure c).
| |
| | | |
| The coarse-graining operation of this system is divided into two steps. The first step is to cluster the nodes in the Boolean network. As shown in figure b below, merge A and B to obtain the macroscopic node [math]\alpha[/math], and merge C and D to obtain the macroscopic node [math]\beta[/math]. The second step is to map the microscopic node states in each group to the merged macroscopic node states. This mapping function is shown in figure d below. All microscopic node states containing 0 are transformed into the off state of the macroscopic node, while the microscopic 11 state is transformed into the on state of the macroscopic. In this way, we can obtain a new macroscopic Boolean network, and obtain the dynamic mechanism of the macroscopic Boolean network according to the dynamic mechanism of the microscopic nodes. According to this mechanism, the state transition matrix of the macroscopic network can be obtained (as shown in figure e). | | The coarse-graining operation of this system is divided into two steps. The first step is to cluster the nodes in the Boolean network. As shown in figure b below, merge A and B to obtain the macroscopic node [math]\alpha[/math], and merge C and D to obtain the macroscopic node [math]\beta[/math]. The second step is to map the microscopic node states in each group to the merged macroscopic node states. This mapping function is shown in figure d below. All microscopic node states containing 0 are transformed into the off state of the macroscopic node, while the microscopic 11 state is transformed into the on state of the macroscopic. In this way, we can obtain a new macroscopic Boolean network, and obtain the dynamic mechanism of the macroscopic Boolean network according to the dynamic mechanism of the microscopic nodes. According to this mechanism, the state transition matrix of the macroscopic network can be obtained (as shown in figure e). |
| + | |
| | | |
| Through comparison, we find that the effective information of macroscopic dynamics is greater than that of microscopic dynamics <math>EI(f_M\ )>EI(f_m\ ) </math>. Causal emergence occurs in this system. | | Through comparison, we find that the effective information of macroscopic dynamics is greater than that of microscopic dynamics <math>EI(f_M\ )>EI(f_m\ ) </math>. Causal emergence occurs in this system. |
− | [[文件:含有4个节点的布尔网络.png|无|缩略图]] | + | |
| + | [[文件:含有4个节点的布尔网络.png|居左|700x700像素|离散布尔网络上的因果涌现|替代=含有4个节点布尔网络的因果涌现]] |
| | | |
| =====Causal Emergence in Continuous Variables===== | | =====Causal Emergence in Continuous Variables===== |
| + | Furthermore, in the paper <ref name="Chvykov_causal_geometry">{{cite journal|author1=Chvykov P|author2=Hoel E.|title=Causal Geometry|journal=Entropy|year=2021|volume=23|issue=1|page=24|url=https://doi.org/10.3390/e2}}</ref>, Hoel et al. proposed the theoretical framework of causal geometry, trying to generalize the causal emergence theory to function mappings and dynamical systems with continuous states. This article defines <math>EI</math> for random function mapping, and also introduces the concepts of intervention noise and causal geometry, and compares and analogizes this concept with information geometry. Liu Kaiwei et al.<ref name="An_exact_theory_of_causal_emergence">{{cite journal|author1=Liu K|author2=Yuan B|author3=Zhang J|title=An Exact Theory of Causal Emergence for Linear Stochastic Iteration Systems|journal=Entropy|year=2024|volume=26|issue=8|page=618|url=https://arxiv.org/abs/2405.09207}}</ref> further gave an exact analytical causal emergence theory for random iterative dynamical systems. |
| | | |
− | Furthermore, in the paper [42], Hoel et al. proposed the theoretical framework of causal geometry, trying to generalize the causal emergence theory to function mappings and dynamical systems with continuous states. This article defines <math>EI</math> for random function mapping, and also introduces the concepts of intervention noise and causal geometry, and compares and analogizes this concept with information geometry. Liu Kaiwei et al.[43] further gave an exact analytical causal emergence theory for random iterative dynamical systems.
| |
| | | |
| ====Rosas's Causal Emergence Theory==== | | ====Rosas's Causal Emergence Theory==== |
− | | + | Rosas et al. <ref name=":5" /> From the perspective of [[information decomposition]] theory, propose a method for defining causal emergence based on [[integrated information decomposition]], and further divide causal emergence into two parts: [[causal decoupling]] (Causal Decoupling) and [[downward causation]] (Downward Causation). Among them, causal decoupling represents the causal effect of the macroscopic state at the current moment on the macroscopic state at the next moment, and downward causation represents the causal effect of the macroscopic state at the previous moment on the microscopic state at the next moment. The schematic diagrams of causal decoupling and downward causation are shown in the following figure. The microscopic state input is <math>X_t\ (X_t^1,X_t^2,…,X_t^n ) </math>, and the macroscopic state is <math>V_t </math>, which is obtained by coarse-graining the microscopic state variable <math>X_t </math>, so it is a supervenient feature of <math>X_t </math>, <math>X_{t + 1} </math> and <math>V_{t + 1} </math> represent the microscopic and macroscopic states at the next moment respectively. |
− | Rosas et al. [37] From the perspective of [[information decomposition]] theory, propose a method for defining causal emergence based on [[integrated information decomposition]], and further divide causal emergence into two parts: [[causal decoupling]] (Causal Decoupling) and [[downward causation]] (Downward Causation). Among them, causal decoupling represents the causal effect of the macroscopic state at the current moment on the macroscopic state at the next moment, and downward causation represents the causal effect of the macroscopic state at the previous moment on the microscopic state at the next moment. The schematic diagrams of causal decoupling and downward causation are shown in the following figure. The microscopic state input is <math>X_t\ (X_t^1,X_t^2,…,X_t^n ) </math>, and the macroscopic state is <math>V_t </math>, which is obtained by coarse-graining the microscopic state variable <math>X_t </math>, so it is a supervenient feature of <math>X_t </math>, <math>X_{t + 1} </math> and <math>V_{t + 1} </math> represent the microscopic and macroscopic states at the next moment respectively. | |
| | | |
| [[文件:向下因果与因果解耦2.png|居左|300x300像素|因果解耦与向下因果]] | | [[文件:向下因果与因果解耦2.png|居左|300x300像素|因果解耦与向下因果]] |
| | | |
| =====Partial Information Decomposition===== | | =====Partial Information Decomposition===== |
| + | This method is based on the nonnegative decomposition of multivariate information theory proposed by Williams and Beer et al <ref name=":16" />. This paper uses partial information decomposition (PID) to decompose the mutual information between microstates and macrostates. |
| | | |
− | This method is based on the nonnegative decomposition of multivariate information theory proposed by Williams and Beer et al [44]. This paper uses partial information decomposition (PID) to decompose the mutual information between microstates and macrostates.
| |
| | | |
| Without loss of generality, assume that our microstate is <math>X(X^1,X^2)</math>, that is, it is a two-dimensional variable, and the macrostate is <math>V</math>. Then the mutual information between the two can be decomposed into four parts: | | Without loss of generality, assume that our microstate is <math>X(X^1,X^2)</math>, that is, it is a two-dimensional variable, and the macrostate is <math>V</math>. Then the mutual information between the two can be decomposed into four parts: |
第172行: |
第186行: |
| | | |
| Among them, <math>Red(X^1,X^2;V)</math> represents redundant information, which refers to the information repeatedly provided by two microstates <math>X^1</math> and <math>X^2</math> to the macrostate <math>V</math>; <math>Un(X^1;V│X^2)</math> and <math>Un(X^2;V│X^1)</math> represent unique information, which refers to the information provided by each microstate variable alone to the macrostate; <math>Syn(X^1,X^2;V)</math> represents synergistic information, which refers to the information provided by all microstates <math>X</math> jointly to the macrostate <math>V</math>. | | Among them, <math>Red(X^1,X^2;V)</math> represents redundant information, which refers to the information repeatedly provided by two microstates <math>X^1</math> and <math>X^2</math> to the macrostate <math>V</math>; <math>Un(X^1;V│X^2)</math> and <math>Un(X^2;V│X^1)</math> represent unique information, which refers to the information provided by each microstate variable alone to the macrostate; <math>Syn(X^1,X^2;V)</math> represents synergistic information, which refers to the information provided by all microstates <math>X</math> jointly to the macrostate <math>V</math>. |
| + | |
| | | |
| =====Definition of Causal Emergence===== | | =====Definition of Causal Emergence===== |
| + | However, the PID framework can only decompose the mutual information between multiple source variables and one target variable. Rosas extended this framework and proposed the integrated information decomposition method <math>\Phi ID</math><ref name=":18" /> to handle the mutual information between multiple source variables and multiple target variables. It can also be used to decompose the mutual information between different moments. Based on the decomposed information, the author proposed two definition methods of causal emergence: |
| | | |
− | However, the PID framework can only decompose the mutual information between multiple source variables and one target variable. Rosas extended this framework and proposed the integrated information decomposition method <math>\Phi ID</math>[45] to handle the mutual information between multiple source variables and multiple target variables. It can also be used to decompose the mutual information between different moments. Based on the decomposed information, the author proposed two definition methods of causal emergence:
| |
| | | |
| 1) When the unique information <math>Un(V_t;X_{t+1}| X_t^1,\ldots,X_t^n\ )>0 </math>, it means that the macroscopic state <math>V_t</math> at the current moment can provide more information to the overall system <math>X_{t + 1}</math> at the next moment than the microscopic state <math>X_t</math> at the current moment. At this time, there is causal emergence in the system; | | 1) When the unique information <math>Un(V_t;X_{t+1}| X_t^1,\ldots,X_t^n\ )>0 </math>, it means that the macroscopic state <math>V_t</math> at the current moment can provide more information to the overall system <math>X_{t + 1}</math> at the next moment than the microscopic state <math>X_t</math> at the current moment. At this time, there is causal emergence in the system; |
| | | |
| 2) The second method bypasses the selection of a specific macroscopic state <math>V_t</math>, and defines causal emergence only based on the synergistic information between the microscopic state <math>X_t</math> and the microscopic state <math>X_{t + 1}</math> at the next moment of the system. When the synergistic information <math>Syn(X_t^1,…,X_t^n;X_{t + 1}^1,…,X_{t + 1}^n)>0</math>, causal emergence occurs in the system. | | 2) The second method bypasses the selection of a specific macroscopic state <math>V_t</math>, and defines causal emergence only based on the synergistic information between the microscopic state <math>X_t</math> and the microscopic state <math>X_{t + 1}</math> at the next moment of the system. When the synergistic information <math>Syn(X_t^1,…,X_t^n;X_{t + 1}^1,…,X_{t + 1}^n)>0</math>, causal emergence occurs in the system. |
| + | |
| | | |
| It should be noted that for the first method to judge the occurrence of causal emergence, it depends on the selection of the macroscopic state <math>V_t</math>. The first method is the lower bound of the second method. This is because <math>Syn(X_t;X_{t+1}\ ) ≥ Un(V_t;X_{t+1}| X_t\ )</math> always holds. So, if <math>Un(V_t;X_{t + 1}|X_t)</math> is greater than 0, then causal emergence occurs in the system. However, the selection of <math>V_t</math> often requires predefining a coarse-graining function, so the limitations of the Erik Hoel causal emergence theory cannot be avoided. Another natural idea is to use the second method to judge the occurrence of causal emergence with the help of synergistic information. However, the calculation of synergistic information is very difficult and there is a combinatorial explosion problem. Therefore, the calculation based on synergistic information in the second method is often infeasible. In short, both quantitative characterization methods of causal emergence have some weaknesses, so a more reasonable quantification method needs to be proposed. | | It should be noted that for the first method to judge the occurrence of causal emergence, it depends on the selection of the macroscopic state <math>V_t</math>. The first method is the lower bound of the second method. This is because <math>Syn(X_t;X_{t+1}\ ) ≥ Un(V_t;X_{t+1}| X_t\ )</math> always holds. So, if <math>Un(V_t;X_{t + 1}|X_t)</math> is greater than 0, then causal emergence occurs in the system. However, the selection of <math>V_t</math> often requires predefining a coarse-graining function, so the limitations of the Erik Hoel causal emergence theory cannot be avoided. Another natural idea is to use the second method to judge the occurrence of causal emergence with the help of synergistic information. However, the calculation of synergistic information is very difficult and there is a combinatorial explosion problem. Therefore, the calculation based on synergistic information in the second method is often infeasible. In short, both quantitative characterization methods of causal emergence have some weaknesses, so a more reasonable quantification method needs to be proposed. |
| + | |
| | | |
| =====Specific Example===== | | =====Specific Example===== |
第187行: |
第204行: |
| [[文件:因果解耦以及向下因果例子1.png|500x500像素|居左|因果解耦以及向下因果例子]] | | [[文件:因果解耦以及向下因果例子1.png|500x500像素|居左|因果解耦以及向下因果例子]] |
| | | |
− | The author of the paper [37] lists a specific example (as above), to illustrate when causal decoupling, downward causation and causal emergence occur. This example is a special Markov process. Here, <math>p_{X_{t + 1}|X_t}(x_{t + 1}|x_t)</math> represents the dynamic relationship, and <math>X_t=(x_t^1,…,x_t^n)\in\{0,1\}^n</math> is the microstate. The definition of this process is to determine the probability of taking different values of the state <math>x_{t + 1}</math> at the next moment by checking the values of the variables <math>x_t</math> and <math>x_{t + 1}</math> at two consecutive moments, that is, judging whether the sum modulo 2 of all dimensions of <math>x_t</math> is the same as the first dimension of <math>x_{t + 1}</math>: if they are different, the probability is 0; otherwise, judge whether <math>x_t,x_{t + 1}</math> have the same sum modulo 2 value in all dimensions. If both conditions are satisfied, the value probability is <math>\gamma/2^{n - 2}</math>, otherwise the value probability is <math>(1-\gamma)/2^{n - 2}</math>. Here <math>\gamma</math> is a parameter and <math>n</math> is the total dimension of x. | + | The author of the paper <ref name=":5" /> lists a specific example (as above), to illustrate when causal decoupling, downward causation and causal emergence occur. This example is a special Markov process. Here, <math>p_{X_{t + 1}|X_t}(x_{t + 1}|x_t)</math> represents the dynamic relationship, and <math>X_t=(x_t^1,…,x_t^n)\in\{0,1\}^n</math> is the microstate. The definition of this process is to determine the probability of taking different values of the state <math>x_{t + 1}</math> at the next moment by checking the values of the variables <math>x_t</math> and <math>x_{t + 1}</math> at two consecutive moments, that is, judging whether the sum modulo 2 of all dimensions of <math>x_t</math> is the same as the first dimension of <math>x_{t + 1}</math>: if they are different, the probability is 0; otherwise, judge whether <math>x_t,x_{t + 1}</math> have the same sum modulo 2 value in all dimensions. If both conditions are satisfied, the value probability is <math>\gamma/2^{n - 2}</math>, otherwise the value probability is <math>(1-\gamma)/2^{n - 2}</math>. Here <math>\gamma</math> is a parameter and <math>n</math> is the total dimension of x. |
| + | |
| | | |
| In fact, if <math>\sum_{j = 1}^n x^j_t</math> is even or 0, then <math>\oplus^n_{j = 1} x^j_t:=1</math>, otherwise <math>\oplus^n_{j = 1} x^j_t:=0</math>. Therefore, the result of <math>\oplus^n_{j = 1} x^j_t</math> is the parity of the entire X sequence, and the first dimension can be regarded as a parity check bit. <math>\gamma</math> actually represents the probability that a mutation occurs in two bits of the X sequence, and this mutation can ensure that the parity of the entire sequence remains unchanged, and the parity check bit of the sequence also conforms to the actual parity of the entire sequence. | | In fact, if <math>\sum_{j = 1}^n x^j_t</math> is even or 0, then <math>\oplus^n_{j = 1} x^j_t:=1</math>, otherwise <math>\oplus^n_{j = 1} x^j_t:=0</math>. Therefore, the result of <math>\oplus^n_{j = 1} x^j_t</math> is the parity of the entire X sequence, and the first dimension can be regarded as a parity check bit. <math>\gamma</math> actually represents the probability that a mutation occurs in two bits of the X sequence, and this mutation can ensure that the parity of the entire sequence remains unchanged, and the parity check bit of the sequence also conforms to the actual parity of the entire sequence. |
| + | |
| | | |
| Therefore, the macroscopic state of this process can be regarded as the parity of the sum of all dimensions of the entire sequence, and the probability distribution of this parity is the result of the exclusive OR calculation of the microstate. <math>x_{t + 1}^1</math> is a special microstate that always remains consistent with the macroscopic state of the sequence at the previous moment. Therefore, when only the first item in the second judgment condition is satisfied, the downward causation condition of the system occurs. When only the second item is satisfied, the causal decoupling of the system occurs. When both items are satisfied simultaneously, it is said that causal emergence occurs in the system. | | Therefore, the macroscopic state of this process can be regarded as the parity of the sum of all dimensions of the entire sequence, and the probability distribution of this parity is the result of the exclusive OR calculation of the microstate. <math>x_{t + 1}^1</math> is a special microstate that always remains consistent with the macroscopic state of the sequence at the previous moment. Therefore, when only the first item in the second judgment condition is satisfied, the downward causation condition of the system occurs. When only the second item is satisfied, the causal decoupling of the system occurs. When both items are satisfied simultaneously, it is said that causal emergence occurs in the system. |
| + | |
| | | |
| ====Causal Emergence Theory Based on Singular Value Decomposition==== | | ====Causal Emergence Theory Based on Singular Value Decomposition==== |
| + | Erik Hoel's causal emergence theory has the problem of needing to specify a coarse-graining strategy in advance. Rosas' information decomposition theory does not completely solve this problem. Therefore, Zhang Jiang et al.<ref name=":2">Zhang J, Tao R, Yuan B. Dynamical Reversibility and A New Theory of Causal Emergence. arXiv preprint arXiv:2402.15054. 2024 Feb 23.</ref> further proposed the causal emergence theory based on singular value decomposition. |
| | | |
− | Erik Hoel's causal emergence theory has the problem of needing to specify a coarse-graining strategy in advance. Rosas' information decomposition theory does not completely solve this problem. Therefore, Zhang Jiang et al.[26] further proposed the causal emergence theory based on singular value decomposition.
| |
| | | |
| =====Singular Value Decomposition of Markov Chains===== | | =====Singular Value Decomposition of Markov Chains===== |
− |
| |
| Given the Markov transition matrix <math>P</math> of a system, we can perform singular value decomposition on it to obtain two orthogonal and normalized matrices <math>U</math> and <math>V</math>, and a diagonal matrix <math>\Sigma</math>: <math>P = U\Sigma V^T</math>, where [math]\Sigma = diag(\sigma_1,\sigma_2,\cdots,\sigma_N)[/math], where [math]\sigma_1\geq\sigma_2\geq\cdots\sigma_N[/math] are the singular values of <math>P</math> and are arranged in descending order. <math>N</math> is the number of states of <math>P</math>. | | Given the Markov transition matrix <math>P</math> of a system, we can perform singular value decomposition on it to obtain two orthogonal and normalized matrices <math>U</math> and <math>V</math>, and a diagonal matrix <math>\Sigma</math>: <math>P = U\Sigma V^T</math>, where [math]\Sigma = diag(\sigma_1,\sigma_2,\cdots,\sigma_N)[/math], where [math]\sigma_1\geq\sigma_2\geq\cdots\sigma_N[/math] are the singular values of <math>P</math> and are arranged in descending order. <math>N</math> is the number of states of <math>P</math>. |
| | | |
第210行: |
第229行: |
| | | |
| Here, [math]\alpha\in(0,2)[/math] is a specified parameter that acts as a weight or tendency to make [math]\Gamma_{\alpha}[/math] reflect determinism or degeneracy more. Under normal circumstances, we take [math]\alpha = 1[/math], which can make [math]\Gamma_{\alpha}[/math] achieve a balance between determinism and degeneracy. | | Here, [math]\alpha\in(0,2)[/math] is a specified parameter that acts as a weight or tendency to make [math]\Gamma_{\alpha}[/math] reflect determinism or degeneracy more. Under normal circumstances, we take [math]\alpha = 1[/math], which can make [math]\Gamma_{\alpha}[/math] achieve a balance between determinism and degeneracy. |
| + | |
| | | |
| In addition, the authors in the literature prove that there is an approximate relationship between <math>EI</math> and [math]\Gamma_{\alpha}[/math]: | | In addition, the authors in the literature prove that there is an approximate relationship between <math>EI</math> and [math]\Gamma_{\alpha}[/math]: |
第218行: |
第238行: |
| | | |
| Moreover, to a certain extent, [math]\Gamma_{\alpha}[/math] can be used instead of EI to measure the degree of causal effect of Markov chains. Therefore, the so-called causal emergence can also be understood as an '''emergence of dynamical reversibility'''. | | Moreover, to a certain extent, [math]\Gamma_{\alpha}[/math] can be used instead of EI to measure the degree of causal effect of Markov chains. Therefore, the so-called causal emergence can also be understood as an '''emergence of dynamical reversibility'''. |
| + | |
| | | |
| =====Quantification of Causal Emergence without Coarse-graining===== | | =====Quantification of Causal Emergence without Coarse-graining===== |
− |
| |
| However, the greatest value of this theory lies in the fact that emergence can be directly quantified without a coarse-graining strategy. If the rank of <math>P</math> is <math>r</math>, that is, starting from the <math>r + 1</math>th singular value, all singular values are 0, then we say that the dynamics <math>P</math> has '''clear causal emergence''', and the numerical value of causal emergence is: | | However, the greatest value of this theory lies in the fact that emergence can be directly quantified without a coarse-graining strategy. If the rank of <math>P</math> is <math>r</math>, that is, starting from the <math>r + 1</math>th singular value, all singular values are 0, then we say that the dynamics <math>P</math> has '''clear causal emergence''', and the numerical value of causal emergence is: |
| | | |
第238行: |
第258行: |
| | | |
| The author gives four specific examples of Markov chains. The state transition matrix of this Markov chain is shown in the figure. We can compare the <math>EI</math> and approximate dynamical reversibility (the <math>\Gamma</math> in the figure, that is, <math>\Gamma_{\alpha = 1}</math>) of this Markov chain. Comparing figures a and b, we find that for different state transition matrices, when <math>EI</math> decreases, <math>\Gamma</math> also decreases simultaneously. Further, figures c and d are comparisons of the effects before and after coarse-graining. Among them, figure d is the coarse-graining of the state transition matrix of figure c (merging the first three states into a macroscopic state). Since the macroscopic state transition matrix in figure d is a deterministic system, the normalized <math>EI</math>, <math>eff\equiv EI/\log N</math> and the normalized [math]\Gamma[/math]: <math>\gamma\equiv \Gamma/N</math> all reach the maximum value of 1. | | The author gives four specific examples of Markov chains. The state transition matrix of this Markov chain is shown in the figure. We can compare the <math>EI</math> and approximate dynamical reversibility (the <math>\Gamma</math> in the figure, that is, <math>\Gamma_{\alpha = 1}</math>) of this Markov chain. Comparing figures a and b, we find that for different state transition matrices, when <math>EI</math> decreases, <math>\Gamma</math> also decreases simultaneously. Further, figures c and d are comparisons of the effects before and after coarse-graining. Among them, figure d is the coarse-graining of the state transition matrix of figure c (merging the first three states into a macroscopic state). Since the macroscopic state transition matrix in figure d is a deterministic system, the normalized <math>EI</math>, <math>eff\equiv EI/\log N</math> and the normalized [math]\Gamma[/math]: <math>\gamma\equiv \Gamma/N</math> all reach the maximum value of 1. |
| + | |
| | | |
| ====Dynamic independence==== | | ====Dynamic independence==== |
− | Dynamic independence is a method to characterize the macroscopic dynamical state after coarse-graining being independent of the microscopic dynamical state [40]. The core idea is that although macroscopic variables are composed of microscopic variables, when predicting the future state of macroscopic variables, only the historical information of macroscopic variables is needed, and no additional information from microscopic history is needed. This phenomenon is called dynamic independence by the author. It is another means of quantifying emergence. The macroscopic dynamics at this time is called emergent dynamics. The independence, causal dependence, etc. in the concept of dynamic independence can be quantified by transfer entropy. | + | Dynamic independence is a method to characterize the macroscopic dynamical state after coarse-graining being independent of the microscopic dynamical state <ref name=":6">Barnett L, Seth AK. Dynamical independence: discovering emergent macroscopic processes in complex dynamical systems. Physical Review E. 2023 Jul;108(1):014304.</ref>. The core idea is that although macroscopic variables are composed of microscopic variables, when predicting the future state of macroscopic variables, only the historical information of macroscopic variables is needed, and no additional information from microscopic history is needed. This phenomenon is called dynamic independence by the author. It is another means of quantifying emergence. The macroscopic dynamics at this time is called emergent dynamics. The independence, causal dependence, etc. in the concept of dynamic independence can be quantified by transfer entropy. |
| + | |
| | | |
| =====Quantification of dynamic independence===== | | =====Quantification of dynamic independence===== |
− |
| |
| Transfer entropy is a non-parametric statistic that measures the amount of directed (time-asymmetric) information transfer between two stochastic processes. The transfer entropy from process <math>X</math> to another process <math>Y</math> can be defined as the degree to which knowing the past values of <math>X</math> can reduce the uncertainty about the future value of <math>Y</math> given the past values of <math>Y</math>. The formula is as follows: | | Transfer entropy is a non-parametric statistic that measures the amount of directed (time-asymmetric) information transfer between two stochastic processes. The transfer entropy from process <math>X</math> to another process <math>Y</math> can be defined as the degree to which knowing the past values of <math>X</math> can reduce the uncertainty about the future value of <math>Y</math> given the past values of <math>Y</math>. The formula is as follows: |
| | | |
第249行: |
第270行: |
| | | |
| Here, <math>Y_t</math> represents the macroscopic variable at time <math>t</math>, and <math>X^-_t</math> and <math>Y^-_t</math> represent the microscopic and macroscopic variables before time <math>t</math> respectively. [math]I[/math] is mutual information and [math]H[/math] is Shannon entropy. <math>Y</math> is dynamically decoupled with respect to <math>X</math> if and only if the transfer entropy from <math>X</math> to <math>Y</math> at time <math>t</math> is <math>T_t(X \to Y)=0</math>. | | Here, <math>Y_t</math> represents the macroscopic variable at time <math>t</math>, and <math>X^-_t</math> and <math>Y^-_t</math> represent the microscopic and macroscopic variables before time <math>t</math> respectively. [math]I[/math] is mutual information and [math]H[/math] is Shannon entropy. <math>Y</math> is dynamically decoupled with respect to <math>X</math> if and only if the transfer entropy from <math>X</math> to <math>Y</math> at time <math>t</math> is <math>T_t(X \to Y)=0</math>. |
| + | |
| | | |
| The concept of dynamic independence can be widely applied to a variety of complex dynamical systems, including neural systems, economic processes, and evolutionary processes. Through the coarse-graining method, the high-dimensional microscopic system can be simplified into a low-dimensional macroscopic system, thereby revealing the emergent structure in complex systems. | | The concept of dynamic independence can be widely applied to a variety of complex dynamical systems, including neural systems, economic processes, and evolutionary processes. Through the coarse-graining method, the high-dimensional microscopic system can be simplified into a low-dimensional macroscopic system, thereby revealing the emergent structure in complex systems. |
| + | |
| | | |
| In the paper, the author conducts experimental verification in a linear system. The experimental process is: 1) Use the linear system to generate parameters and laws; 2) Set the coarse-graining function; 3) Obtain the expression of transfer entropy; 4) Optimize and solve the coarse-graining method of maximum decoupling (corresponding to minimum transfer entropy). Here, the optimization algorithm can use transfer entropy as the optimization goal, and then use the gradient descent algorithm to solve the coarse-graining function, or use the genetic algorithm for optimization. | | In the paper, the author conducts experimental verification in a linear system. The experimental process is: 1) Use the linear system to generate parameters and laws; 2) Set the coarse-graining function; 3) Obtain the expression of transfer entropy; 4) Optimize and solve the coarse-graining method of maximum decoupling (corresponding to minimum transfer entropy). Here, the optimization algorithm can use transfer entropy as the optimization goal, and then use the gradient descent algorithm to solve the coarse-graining function, or use the genetic algorithm for optimization. |
| + | |
| | | |
| =====Example===== | | =====Example===== |
| + | The paper gives an example of a linear dynamical system. Its dynamics is a vector autoregressive model. By using genetic algorithms to iteratively evolve different initial conditions, the degree of dynamical decoupling of the system can also gradually increase. At the same time, it is found that different coarse-graining scales will affect the degree of optimization to dynamic independence. The experiment finds that dynamic decoupling can only be achieved at certain scales, but not at other scales. Therefore, the choice of scale is also very important. |
| | | |
− | The paper gives an example of a linear dynamical system. Its dynamics is a vector autoregressive model. By using genetic algorithms to iteratively evolve different initial conditions, the degree of dynamical decoupling of the system can also gradually increase. At the same time, it is found that different coarse-graining scales will affect the degree of optimization to dynamic independence. The experiment finds that dynamic decoupling can only be achieved at certain scales, but not at other scales. Therefore, the choice of scale is also very important.
| |
| | | |
| ===Comparison of Several Causal Emergence Theories=== | | ===Comparison of Several Causal Emergence Theories=== |
第266行: |
第290行: |
| !Method!!Consider Causality?!!Involve Coarse-graining?!!Applicable Dynamical Systems!!Measurement Index | | !Method!!Consider Causality?!!Involve Coarse-graining?!!Applicable Dynamical Systems!!Measurement Index |
| |- | | |- |
− | |Hoel's causal emergence theory [1]||Dynamic causality, the definition of EI introduces do-intervention||Requires specifying a coarse-graining method||Discrete Markov dynamics||Dynamic causality: effective information | + | |Hoel's causal emergence theory <ref name=":0" />||Dynamic causality, the definition of EI introduces do-intervention||Requires specifying a coarse-graining method||Discrete Markov dynamics||Dynamic causality: effective information |
| |- | | |- |
− | |Rosas's causal emergence theory [37]||Approximation by correlation characterized by mutual information||When judged based on synergistic information, no coarse-graining is involved. When calculated based on redundant information, a coarse-graining method needs to be specified.||Arbitrary dynamics||Information decomposition: synergistic information or redundant information | + | |Rosas's causal emergence theory <ref name=":5" />||Approximation by correlation characterized by mutual information||When judged based on synergistic information, no coarse-graining is involved. When calculated based on redundant information, a coarse-graining method needs to be specified.||Arbitrary dynamics||Information decomposition: synergistic information or redundant information |
| |- | | |- |
− | |Causal emergence theory based on reversibility[26]||Dynamic causality, EI is equivalent to approximate dynamical reversibility||Does not depend on a specific coarse-graining strategy||Discrete Markov dynamics||Approximate dynamical reversibility: <math>\Gamma_{\alpha}</math> | + | |Causal emergence theory based on reversibility <ref name=":2"/>||Dynamic causality, EI is equivalent to approximate dynamical reversibility||Does not depend on a specific coarse-graining strategy||Discrete Markov dynamics||Approximate dynamical reversibility: <math>\Gamma_{\alpha}</math> |
| |- | | |- |
− | |Dynamic independence[40]||Granger causality||Requires specifying a coarse-graining method||Arbitrary dynamics||Dynamic independence: transfer entropy | + | |Dynamic independence <ref name=":6"/>||Granger causality||Requires specifying a coarse-graining method||Arbitrary dynamics||Dynamic independence: transfer entropy |
| |} | | |} |
| | | |