第28行: |
第28行: |
| | | |
| === Early work on quantifying emergence === | | === Early work on quantifying emergence === |
− | There have been some related works in the early stage that attempted to quantitatively analyze emergence. The computational mechanics theory proposed by Crutchfield et al. [27] considers causal states. This method discusses related concepts based on the division of state space and is very similar to Erik Hoel's causal emergence theory. On the other hand, Seth et al. proposed the G-emergence theory [28] to quantify emergence by using Granger causality. | + | There have been some related works in the early stage that attempted to quantitatively analyze emergence. The computational mechanics theory proposed by Crutchfield et al. <ref name=":3">J. P. Crutchfield, K. Young, Inferring statistical complexity, Physical review letters 63 (2) (1989) 105.</ref> considers causal states. This method discusses related concepts based on the division of state space and is very similar to Erik Hoel's causal emergence theory. On the other hand, Seth et al. proposed the G-emergence theory <ref name=":4">A. K. Seth, Measuring emergence via nonlinear granger causality., in: alife, Vol. 2008, 2008, pp. 545–552.</ref> to quantify emergence by using Granger causality. |
| + | |
| | | |
| ==== Computational mechanics ==== | | ==== Computational mechanics ==== |
− | The computational mechanics theory attempts to express the causal laws of emergence in a quantitative framework, that is, how to construct a coarse-grained causal model from a random process so that this model can generate the time series of the observed random process [27]. | + | The computational mechanics theory attempts to express the causal laws of emergence in a quantitative framework, that is, how to construct a coarse-grained causal model from a random process so that this model can generate the time series of the observed random process <ref name=":3" />. |
| + | |
| + | |
| + | Here, the random process can be represented by <math>\overleftrightarrow{s}</math>. Based on time <math>t</math>, the random process can be divided into two parts: the process before time <math>t</math>, <math>\overleftarrow{s_t}</math>, and the process after time <math>t</math>, <math>\overrightarrow{s_t}</math>. Computational mechanics records the set of all possible historical processes <math>\overleftarrow{s_t}</math> as <math> \overleftarrow{S}</math>, and the set of all future processes <math>\overrightarrow{s_t}</math> as <math> \overrightarrow{S}</math>. |
| | | |
− | Here, the random process can be represented by <math>\overleftrightarrow{s}</math>. Based on time <math>t</math>, the random process can be divided into two parts: the process before time <math>t</math> and the process after time <math>t</math>, <math>\overleftarrow{s_t}</math> and <math>\overrightarrow{s_t}</math>. Computational mechanics records the set of all possible historical processes <math>\overleftarrow{s_t}</math> as <math> \overleftarrow{S}</math>, and the set of all future processes as <math> \overrightarrow{S}</math>.
| |
| | | |
| The goal of computational mechanics is to establish a model that hopes to reconstruct and predict the observed random sequence in a certain degree of accuracy. However, the randomness of the sequence makes it impossible for us to obtain a perfect reconstruction. Therefore, we need a coarse-grained mapping to capture the ordered structure in the random sequence. This coarse-grained mapping can be characterized by a partitioning function <math>\eta: \overleftarrow{S}→\mathcal{R}</math>, which can divide <math>\overleftarrow{S}</math> into several mutually exclusive subsets (all mutually exclusive subsets form the complete set), and the formed set is denoted as <math>\mathcal{R}</math>. | | The goal of computational mechanics is to establish a model that hopes to reconstruct and predict the observed random sequence in a certain degree of accuracy. However, the randomness of the sequence makes it impossible for us to obtain a perfect reconstruction. Therefore, we need a coarse-grained mapping to capture the ordered structure in the random sequence. This coarse-grained mapping can be characterized by a partitioning function <math>\eta: \overleftarrow{S}→\mathcal{R}</math>, which can divide <math>\overleftarrow{S}</math> into several mutually exclusive subsets (all mutually exclusive subsets form the complete set), and the formed set is denoted as <math>\mathcal{R}</math>. |
| | | |
− | Computational mechanics regards any subset <math>R \in \mathcal{R}</math> as a macroscopic state. For a set of macroscopic state sets <math>\mathcal{R}</math>, computational mechanics uses Shannon entropy to define its statistical complexity index <math>C_\mu</math> to measure the complexity of the state. The following equation: | + | |
| + | Computational mechanics regards any subset <math>R \in \mathcal{R}</math> as a macroscopic state. For a set of macroscopic states <math>\mathcal{R}</math>, computational mechanics uses Shannon entropy to define an index <math>C_\mu</math> to measure the statistical complexity of these states: |
| + | |
| | | |
| <math> | | <math> |
第43行: |
第48行: |
| </math> | | </math> |
| | | |
− | It can be proved that when a set of states is used to build a prediction model, statistical complexity is approximately equivalent to the size of the prediction model. | + | |
| + | It can be proved that when a set of states is used to build a prediction model, their statistical complexity is approximately equivalent to the size of the prediction model. |
| + | |
| | | |
| Furthermore, in order to achieve the best balance between predictability and simplicity for the set of macroscopic states, computational mechanics defines the concept of causal equivalence. If <math>P\left ( \overrightarrow{s}|\overleftarrow{s}\right )=P\left ( \overrightarrow{s}|{\overleftarrow{s}}'\right )</math>, then <math>\overleftarrow{s}</math> and <math>{\overleftarrow{s}}'</math> are causally equivalent. This equivalence relation can divide all historical processes into equivalence classes and define them as causal states. All causal states of the historical process <math>\overleftarrow{s}</math> can be characterized by a mapping <math>\epsilon \left ( \overleftarrow{s} \right )</math>. Here, <math>\epsilon: \overleftarrow{\mathcal{S}}\rightarrow 2^{\overleftarrow{\mathcal{S}}}</math> is a function that maps the historical process <math>\overleftarrow{s}</math> to the causal state <math>\epsilon(\overleftarrow{s})\in 2^{\overleftarrow{\mathcal{S}}}</math>. | | Furthermore, in order to achieve the best balance between predictability and simplicity for the set of macroscopic states, computational mechanics defines the concept of causal equivalence. If <math>P\left ( \overrightarrow{s}|\overleftarrow{s}\right )=P\left ( \overrightarrow{s}|{\overleftarrow{s}}'\right )</math>, then <math>\overleftarrow{s}</math> and <math>{\overleftarrow{s}}'</math> are causally equivalent. This equivalence relation can divide all historical processes into equivalence classes and define them as causal states. All causal states of the historical process <math>\overleftarrow{s}</math> can be characterized by a mapping <math>\epsilon \left ( \overleftarrow{s} \right )</math>. Here, <math>\epsilon: \overleftarrow{\mathcal{S}}\rightarrow 2^{\overleftarrow{\mathcal{S}}}</math> is a function that maps the historical process <math>\overleftarrow{s}</math> to the causal state <math>\epsilon(\overleftarrow{s})\in 2^{\overleftarrow{\mathcal{S}}}</math>. |
| + | |
| | | |
| Further, we can denote the causal transition probability between two causal states <math>S_i</math> and <math>S_j</math> as <math>T_{ij}^{\left ( s \right )}</math>, which is similar to a coarsened macroscopic dynamics. The <math>\epsilon</math>-machine of a random process is defined as an ordered pair <math>\left \{ \epsilon,T \right \}</math>. This is a pattern discovery machine that can achieve prediction by learning the <math>\epsilon</math> and <math>T</math> functions. This is equivalent to defining the so-called identification problem of emergent causality. Here, the <math>\epsilon</math>-machine is a machine that attempts to discover emergent causality in data. | | Further, we can denote the causal transition probability between two causal states <math>S_i</math> and <math>S_j</math> as <math>T_{ij}^{\left ( s \right )}</math>, which is similar to a coarsened macroscopic dynamics. The <math>\epsilon</math>-machine of a random process is defined as an ordered pair <math>\left \{ \epsilon,T \right \}</math>. This is a pattern discovery machine that can achieve prediction by learning the <math>\epsilon</math> and <math>T</math> functions. This is equivalent to defining the so-called identification problem of emergent causality. Here, the <math>\epsilon</math>-machine is a machine that attempts to discover emergent causality in data. |
| | | |
− | Computational mechanics can prove that the causal states obtained through the <math>\epsilon</math>-machine have three important characteristics: "maximum predictability", "minimum statistical complexity", and "minimum randomness", and it is verified that it is optimal in a certain sense. In addition, the author introduces a hierarchical machine reconstruction algorithm that can calculate causal states and <math>\epsilon</math>-machines from observational data. Although this algorithm may not be applicable to all scenarios, the author takes chaotic dynamics, hidden Markov models, and cellular automata as examples and gives numerical calculation results and corresponding machine reconstruction paths. | + | |
| + | Computational mechanics can prove that the causal states obtained through the <math>\epsilon</math>-machine have three important characteristics: "maximum predictability", "minimum statistical complexity", and "minimum randomness", and it is verified that it is optimal in a certain sense. In addition, the author introduces a hierarchical machine reconstruction algorithm that can calculate causal states and <math>\epsilon</math>-machines from observational data. Although this algorithm may not be applicable to all scenarios, the author takes chaotic dynamics, hidden Markov models, and cellular automata as examples and gives numerical calculation results and corresponding machine reconstruction paths <ref name="The_calculi_of_emergence">{{cite journal|author1=Crutchfield, J.P|title=The calculi of emergence: computation, dynamics and induction|journal=Physica D: Nonlinear Phenomena|year=1994|volume=75|issue=1-3|page=11-54|url=https://www.sciencedirect.com/science/article/abs/pii/0167278994902739}}</ref>. |
| + | |
| | | |
| Although the original computational mechanics does not give a clear definition and quantitative theory of emergence, some researchers later further advanced the development of this theory. Shalizi et al. discussed the relationship between computational mechanics and emergence in their work. If process <math>{\overleftarrow{s}}'</math> has higher prediction efficiency than process <math>\overleftarrow{s}</math>, then emergence occurs in process <math>{\overleftarrow{s}}'</math>. The prediction efficiency <math>e</math> of a process is defined as the ratio of its excess entropy to its statistical complexity (<math>e=\frac{E}{C_{\mu}}</math>). <math>e</math> is a real number between 0 and 1. We can regard it as a part of the historical memory stored in the process. In two cases, <math>C_{\mu}=0</math>. One is that this process is completely uniform and deterministic; the other is that it is independently and identically distributed. In both cases, there cannot be any interesting predictions, so we set <math>e=0</math>. At the same time, the author explains that emergence can be understood as a dynamical process in which a pattern gains the ability to adapt to different environments. | | Although the original computational mechanics does not give a clear definition and quantitative theory of emergence, some researchers later further advanced the development of this theory. Shalizi et al. discussed the relationship between computational mechanics and emergence in their work. If process <math>{\overleftarrow{s}}'</math> has higher prediction efficiency than process <math>\overleftarrow{s}</math>, then emergence occurs in process <math>{\overleftarrow{s}}'</math>. The prediction efficiency <math>e</math> of a process is defined as the ratio of its excess entropy to its statistical complexity (<math>e=\frac{E}{C_{\mu}}</math>). <math>e</math> is a real number between 0 and 1. We can regard it as a part of the historical memory stored in the process. In two cases, <math>C_{\mu}=0</math>. One is that this process is completely uniform and deterministic; the other is that it is independently and identically distributed. In both cases, there cannot be any interesting predictions, so we set <math>e=0</math>. At the same time, the author explains that emergence can be understood as a dynamical process in which a pattern gains the ability to adapt to different environments. |
| + | |
| | | |
| The causal emergence framework has many similarities with computational mechanics. All historical processes <math>\overleftarrow{s}</math> can be regarded as microscopic states. All <math>R \in \mathcal{R}</math> correspond to macroscopic states. The function <math>\eta</math> can be understood as a possible coarse-graining function. The causal state <math>\epsilon \left ( \overleftarrow{s} \right )</math> is a special state that can at least have the same predictive power as the microscopic state <math>\overleftarrow{s}</math>. Therefore, <math>\epsilon</math> can be understood as an effective coarse-graining strategy. Causal transfer <math>T</math> corresponds to effective macroscopic dynamics. The characteristic of minimum randomness characterizes the determinism of macroscopic dynamics and can be measured by effective information in causal emergence. | | The causal emergence framework has many similarities with computational mechanics. All historical processes <math>\overleftarrow{s}</math> can be regarded as microscopic states. All <math>R \in \mathcal{R}</math> correspond to macroscopic states. The function <math>\eta</math> can be understood as a possible coarse-graining function. The causal state <math>\epsilon \left ( \overleftarrow{s} \right )</math> is a special state that can at least have the same predictive power as the microscopic state <math>\overleftarrow{s}</math>. Therefore, <math>\epsilon</math> can be understood as an effective coarse-graining strategy. Causal transfer <math>T</math> corresponds to effective macroscopic dynamics. The characteristic of minimum randomness characterizes the determinism of macroscopic dynamics and can be measured by effective information in causal emergence. |
| + | |
| | | |
| ==== G-emergence ==== | | ==== G-emergence ==== |
− | The G-emergence theory was proposed by Seth in 2008 and is one of the earliest studies to quantitatively quantify emergence from a causal perspective [28]. The basic idea is to use nonlinear Granger causality to quantify weak emergence in complex systems. | + | The G-emergence theory was proposed by Seth in 2008 and is one of the earliest studies to quantitatively quantify emergence from a causal perspective <ref name=":4" />. The basic idea is to use nonlinear Granger causality to quantify weak emergence in complex systems. |
| + | |
| | | |
− | Specifically, if we use a binary autoregressive model for prediction, when there are only two variables A and B, there are two equations in the autoregressive model, each equation corresponds to one of the variables, and the current value of each variable is composed of its own value and the value of the other variable within a certain time lag range. In addition, the model also calculates residuals. Here, residuals can be understood as prediction errors and can be used to measure the degree of Granger causal effect (called G-causality) of each equation. The degree to which B is a Granger cause (G-cause) of A is calculated by taking the logarithm of the ratio of the two residual variances, one being the residual of A's autoregressive model when B is omitted, and the other being the residual of the full prediction model (including A and B). In addition, the author also defines the concept of G-autonomous, which represents a measure of the extent to which the past values of a time series can predict its own future values. The strength of this autonomous predictive causal effect can be characterized in a similar way to G-causality. | + | Specifically, if we use a binary autoregressive model for prediction, when there are only two variables A and B, there are two equations in the autoregressive model, each equation corresponds to one of the variables, and the current value of each variable is composed of its own value and the value of the other variable within a certain time lag range. In addition, the model also calculates residuals. Here, residuals can be understood as prediction errors and can be used to measure the degree of Granger causal effect (called G-causality) of each equation. The degree to which B is a Granger cause (G-cause) of A is calculated by taking the logarithm of the ratio of the two residual variances, one being the residual of A's autoregressive model when B is omitted, and the other being the residual of the full prediction model (including A and B). In addition, the author also defines the concept of “G-autonomous”, which represents a measure of the extent to which the past values of a time series can predict its own future values. The strength of this autonomous predictive causal effect can be characterized in a similar way to G-causality. |
− | [[文件:G Emergence Theory.png|无|缩略图]]
| |
| | | |
| + | [[文件:G Emergence Theory.png|G-emergence理论图|alt=G-emergence理论图|居左|400x300像素]] |
| | | |
| As shown in the above figure, we can judge the occurrence of emergence based on the two basic concepts in the above G-causality (here is the measure of emergence based on Granger causality, denoted as G-emergence). If A is understood as a macroscopic variable and B is understood as a microscopic variable. The conditions for emergence to occur include two: 1) A is G-autonomous with respect to B; 2) B is a G-cause of A. The degree of G-emergence is calculated by multiplying the degree of A's G-autonomous by the degree of B's average G-cause. | | As shown in the above figure, we can judge the occurrence of emergence based on the two basic concepts in the above G-causality (here is the measure of emergence based on Granger causality, denoted as G-emergence). If A is understood as a macroscopic variable and B is understood as a microscopic variable. The conditions for emergence to occur include two: 1) A is G-autonomous with respect to B; 2) B is a G-cause of A. The degree of G-emergence is calculated by multiplying the degree of A's G-autonomous by the degree of B's average G-cause. |
| + | |
| | | |
| The G-emergence theory proposed by Seth is the first attempt to use causal measures to quantify emergence phenomena. However, the causal relationship used by the author is Granger causality, which is not a strict causal relationship. At the same time, the measurement results also depend on the regression method used. In addition, the measurement index of this method is defined according to variables rather than dynamics, which means that the results will depend on the choice of variables. These all constitute the drawbacks of the G-emergence theory. | | The G-emergence theory proposed by Seth is the first attempt to use causal measures to quantify emergence phenomena. However, the causal relationship used by the author is Granger causality, which is not a strict causal relationship. At the same time, the measurement results also depend on the regression method used. In addition, the measurement index of this method is defined according to variables rather than dynamics, which means that the results will depend on the choice of variables. These all constitute the drawbacks of the G-emergence theory. |
| + | |
| | | |
| The causal emergence framework also has similarities with the aforementioned G-emergence. The macroscopic states of both methods need to be manually selected. In addition, it should be noted that some of the above methods for quantitatively quantifying emergence often do not consider true interventionist causality. | | The causal emergence framework also has similarities with the aforementioned G-emergence. The macroscopic states of both methods need to be manually selected. In addition, it should be noted that some of the above methods for quantitatively quantifying emergence often do not consider true interventionist causality. |
| + | |
| | | |
| ==== Other theories for quantitatively characterizing emergence ==== | | ==== Other theories for quantitatively characterizing emergence ==== |
− | In addition, there are some other quantitative theories of emergence. There are mainly two methods that are widely discussed. One is to understand emergence from the process from disorder to order. Moez Mnif and Christian Müller-Schloer [30] use Shannon entropy to measure order and disorder. In the self-organization process, emergence occurs when order increases. The increase in order is calculated by measuring the difference in Shannon entropy between the initial state and the final state. However, the defect of this method is that it depends on the abstract observation level and the initial conditions of the system. To overcome these two difficulties, the authors propose a measurement method compared with the maximum entropy distribution. Inspired by the work of Moez mif and Christian Müller-Schloer, reference [31] suggests using the divergence between two probability distributions to quantify emergence. They understand emergence as an unexpected or unpredictable distribution change based on the observed samples. But this method has disadvantages such as large computational complexity and low estimation accuracy. To solve these problems, reference [32] further proposes an approximate method for estimating density using Gaussian mixture models and introduces Mahalanobis distance to characterize the difference between data and Gaussian components, thus obtaining better results. In addition, Holzer and de Meer [33][34] and others proposed another emergence measurement method based on Shannon entropy. They believe that a complex system is a self-organizing process in which different individuals interact through communication. Then, we can measure emergence according to the ratio between the Shannon entropy measure of all communications between agents and the sum of Shannon entropies as separate sources. | + | In addition, there are some other quantitative theories of emergence. There are mainly two methods that are widely discussed. One is to understand emergence from the process from disorder to order. Moez Mnif and Christian Müller-Schloer <ref>Mnif, M.; Müller-Schloer, C. Quantitative emergence. In Organic Computing—A Paradigm Shift for Complex Systems; Springer: Basel, Switzerland, 2011; pp. 39–52. </ref> use Shannon entropy to measure order and disorder. In the self-organization process, emergence occurs when order increases. The increase in order is calculated by measuring the difference in Shannon entropy between the initial state and the final state. However, the defect of this method is that it depends on the abstract observation level and the initial conditions of the system. To overcome these two difficulties, the authors propose a measurement method compared with the maximum entropy distribution. Inspired by the work of Moez mif and Christian Müller-Schloer, reference <ref>Fisch, D.; Jänicke, M.; Sick, B.; Müller-Schloer, C. Quantitative emergence–A refined approach based on divergence measures. In Proceedings of the 2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems, Budapest, Hungary, 27 September–1 October 2010; IEEE Computer Society: Washington, DC, USA, 2010; pp. 94–103. </ref> suggests using the divergence between two probability distributions to quantify emergence. They understand emergence as an unexpected or unpredictable distribution change based on the observed samples. But this method has disadvantages such as large computational complexity and low estimation accuracy. To solve these problems, reference <ref>Fisch, D.; Fisch, D.; Jänicke, M.; Kalkowski, E.; Sick, B. Techniques for knowledge acquisition in dynamically changing environments. ACM Trans. Auton. Adapt. Syst. (TAAS) 2012, 7, 1–25. [CrossRef] </ref> further proposes an approximate method for estimating density using Gaussian mixture models and introduces Mahalanobis distance to characterize the difference between data and Gaussian components, thus obtaining better results. In addition, Holzer, de Meer et al. <ref>Holzer, R.; De Meer, H.; Bettstetter, C. On autonomy and emergence in self-organizing systems. In International Workshop on Self-Organizing Systems, Proceedings of the Third International Workshop, IWSOS 2008, Vienna, Austria, 10–12 December 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 157–169.</ref><ref>Holzer, R.; de Meer, H. Methods for approximations of quantitative measures in self-organizing systems. In Proceedings of the Self-Organizing Systems: 5th International Workshop, IWSOS 2011, Karlsruhe, Germany, 23–24 February 2011; Proceedings 5; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1–15.</ref> proposed another emergence measurement method based on Shannon entropy. They believe that a complex system is a self-organizing process in which different individuals interact through communication. Then, we can measure emergence according to the ratio between the Shannon entropy measure of all communications between agents and the sum of Shannon entropies as separate sources. |
| + | |
| | | |
− | Another method is to understand emergence from the perspective of "the whole is greater than the sum of its parts" [35][36]. This method defines emergence from interaction rules and the states of agents rather than statistically measuring from the totality of the entire system. Specifically, this measure consists of subtracting two terms. The first term describes the collective state of the entire system, while the second term represents the sum of the individual states of all components. This measure emphasizes that emergence arises from the interactions and collective behavior of the system. | + | Another method is to understand emergence from the perspective of "the whole is greater than the sum of its parts" <ref>Teo, Y.M.; Luong, B.L.; Szabo, C. Formalization of emergence in multi-agent systems. In Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, Montreal, QC, Canada, 19–22 May 2013; pp. 231–240. </ref><ref>Szabo, C.; Teo, Y.M. Formalization of weak emergence in multiagent systems. ACM Trans. Model. Comput. Simul. (TOMACS) 2015, 26, 1–25. [CrossRef] </ref>. This method defines emergence from interaction rules and the states of agents rather than statistically measuring from the totality of the entire system. Specifically, this measure consists of subtracting two terms. The first term describes the collective state of the entire system, while the second term represents the sum of the individual states of all components. This measure emphasizes that emergence arises from the interactions and collective behavior of the system. |
| | | |
| === Causal emergence theory based on effective information === | | === Causal emergence theory based on effective information === |