更改

Causal Emergence (查看源代码)

2024年10月30日 (三) 09:08的版本

添加10,555字节、 2024年10月30日 (星期三)

第545行：第545行：

Throughout history, there has been a long-standing debate on the ontological and epistemological aspects of causality and emergence.

−

For example, Yurchenko pointed out in the literature ~~<ref>Yurchenko, S. B. (2023). Can there be a synergistic core emerging in the brain hierarchy to control neural activity by downward causation?. Authorea Preprints.</ref>~~ that the concept of "causation" is often ambiguous and should be distinguished into two different concepts of "cause" and "reason", which respectively conform to ontological and epistemological causality. Among them, cause refers to the real cause that fully leads to the result, while reason is only the observer's explanation of the result. Reason may not be as strict as a real cause, but it does provide a certain degree of predictability. Similarly, there is also a debate about the nature of causal emergence.

+

For example, Yurchenko pointed out in the literature [61] that the concept of "causation" is often ambiguous and should be distinguished into two different concepts of "cause" and "reason", which respectively conform to ontological and epistemological causality. Among them, cause refers to the real cause that fully leads to the result, while reason is only the observer's explanation of the result. Reason may not be as strict as a real cause, but it does provide a certain degree of predictability. Similarly, there is also a debate about the nature of causal emergence.

Is causal emergence a real phenomenon that exists independently of a specific observer? Here it should be emphasized that for Hoel's theory, different coarse-graining strategies can lead to different macroscopic dynamical mechanisms and different causal effect measurement results (<math>EI</math>). Essentially, different coarse-graining strategies can represent different observers. Hoel's theory links emergence with causality through intervention and introduces the concept of causal emergence in a quantitative way. Hoel's theory proposes a scheme to eliminate the influence of different coarse-graining methods, that is, maximizing <math>EI</math>. The coarse-graining scheme that maximizes EI is the only objective scheme. Therefore, for a given set of Markov dynamics, only the coarse-graining strategy and corresponding macroscopic dynamics that maximize <math>EI</math> can be considered objective results. However, when the solution that maximizes <math>EI</math> is not unique, that is, there are multiple coarse-graining schemes that can maximize <math>EI</math>, it will lead to difficulties in this theory and a certain degree of subjectivity cannot be avoided.

−

Dewhurst ~~<ref>Dewhurst, J. (2021). Causal emergence from effective information: Neither causal nor emergent?. Thought: A Journal of Philosophy, 10(3), 158-168.</ref>~~ provides a philosophical clarification of Hoel's theory, arguing that it is epistemological rather than ontological. This indicates that Hoel's macroscopic causality is only a causal explanation based on information theory and does not involve "true causality". This also raises questions about the assumption of uniform distribution (see the entry for effective information), as there is no evidence that it should be superior to other distributions.

+

Dewhurst [62] provides a philosophical clarification of Hoel's theory, arguing that it is epistemological rather than ontological. This indicates that Hoel's macroscopic causality is only a causal explanation based on information theory and does not involve "true causality". This also raises questions about the assumption of uniform distribution (see the entry for effective information), as there is no evidence that it should be superior to other distributions.

In addition, Hoel's <math>EI</math> calculation and the quantification of causal emergence depend on two known prerequisite factors: (1) known microscopic dynamics; (2) known coarse-graining scheme. However, in practice, people rarely can obtain both of these factors at the same time, especially in observational studies, these two factors may be unknown. Therefore, this limitation hinders the practical applicability of Hoel's theory.

−

At the same time, it is pointed out that Hoel's theory ignores the constraints on the coarse-graining method, and some coarse-graining methods may lead to ambiguity ~~<ref>Eberhardt, F., & Lee, L. L. (2022). Causal emergence: When distortions in a map obscure the territory. Philosophies, 7(2), 30.</ref>~~. In addition, some combinations of state coarse-graining operations and time coarse-graining operations do not exhibit commutativity. For example, assume that <math>A_{m\times n}</math> is a state coarse-graining operation (combining n states into m states). Here, the coarse-graining strategy is the strategy that maximizes the effective information of the macroscopic state transition matrix. <math>(\cdot) \times (\cdot)</math> is a time coarse-graining operation (combining two time steps into one). In this way, [math]A_{m\times n}(TPM_{n\times n})[/math] is to perform coarse-graining on a [math]n\times n[/math] TPM, and the coarse-graining process is simplified as the product of matrix [math]A[/math] and matrix [math]TPM[/math].

+

At the same time, it is pointed out that Hoel's theory ignores the constraints on the coarse-graining method, and some coarse-graining methods may lead to ambiguity [63]. In addition, some combinations of state coarse-graining operations and time coarse-graining operations do not exhibit commutativity. For example, assume that <math>A_{m\times n}</math> is a state coarse-graining operation (combining n states into m states). Here, the coarse-graining strategy is the strategy that maximizes the effective information of the macroscopic state transition matrix. <math>(\cdot) \times (\cdot)</math> is a time coarse-graining operation (combining two time steps into one). In this way, [math]A_{m\times n}(TPM_{n\times n})[/math] is to perform coarse-graining on a [math]n\times n[/math] TPM, and the coarse-graining process is simplified as the product of matrix [math]A[/math] and matrix [math]TPM[/math].

Then, the commutativity condition of spatial coarse-graining and temporal coarse-graining is the following equation:

第565行：第565行：

The left side represents first performing coarse-graining on the states of two consecutive time steps, and then multiplying the dynamics TPM of the two time steps together to obtain a transfer matrix for two-step evolution; the right side of the equation represents first multiplying the TPMs of two time steps together to obtain the two-step evolution of the microscopic state, and then using A for coarse-graining to obtain the macroscopic TPM. The non-satisfaction of this equation indicates that some coarse-graining operations will cause differences between the evolution of macroscopic states and the coarse-grained states of the microscopic system after evolution. This means that some kind of consistency constraint needs to be added to the coarse-graining strategy.

−

However, as pointed out in the literature ~~<ref name=":6" />~~, the above problem can be alleviated by considering the error factor of the model while maximizing EI in the continuous variable space. However, although machine learning techniques facilitate the learning of causal relationships and causal mechanisms and the identification of emergent properties, it is important whether the results obtained through machine learning reflect ontological causality and emergence, or are they just an epistemological phenomenon? This is still undecided. Although the introduction of machine learning does not necessarily solve the debate around ontological and epistemological causality and emergence, it can provide a dependence that helps reduce subjectivity. This is because the machine learning agent can be regarded as an "objective" observer who makes judgments about causality and emergence that are independent of human observers. However, the problem of a unique solution still exists in this method. Is the result of machine learning ontological or epistemological? The answer is that the result is epistemological, where the epistemic subject is the machine learning algorithm. However, this does not mean that all results of machine learning are meaningless, because if the learning subject is well trained and the defined mathematical objective is effectively optimized, then the result can also be considered objective because the algorithm itself is objective and transparent. Combining machine learning methods can help us establish a theoretical framework for observers and study the interaction between observers and the corresponding observed complex systems.

+

However, as pointed out in the literature [40], the above problem can be alleviated by considering the error factor of the model while maximizing EI in the continuous variable space. However, although machine learning techniques facilitate the learning of causal relationships and causal mechanisms and the identification of emergent properties, it is important whether the results obtained through machine learning reflect ontological causality and emergence, or are they just an epistemological phenomenon? This is still undecided. Although the introduction of machine learning does not necessarily solve the debate around ontological and epistemological causality and emergence, it can provide a dependence that helps reduce subjectivity. This is because the machine learning agent can be regarded as an "objective" observer who makes judgments about causality and emergence that are independent of human observers. However, the problem of a unique solution still exists in this method. Is the result of machine learning ontological or epistemological? The answer is that the result is epistemological, where the epistemic subject is the machine learning algorithm. However, this does not mean that all results of machine learning are meaningless, because if the learning subject is well trained and the defined mathematical objective is effectively optimized, then the result can also be considered objective because the algorithm itself is objective and transparent. Combining machine learning methods can help us establish a theoretical framework for observers and study the interaction between observers and the corresponding observed complex systems.

+

==Related research fields==

+

There are some related research fields that are closely related to causal emergence theory. Here we focus on introducing the differences and connections with three related fields: reduction of dynamical models, dynamic mode decomposition, and simplification of Markov chains.

+

===Reduction of dynamical models===

+

An important indicator of causal emergence is the selection of coarse-graining strategies. When the microscopic model is known, coarse-graining the microscopic state is equivalent to performing '''model reduction''' on the microscopic model. Model reduction is an important subfield in control theory. Antoulas once wrote a related review article [64].

+

Model reduction is to simplify and reduce the dimension of the high-dimensional complex system dynamics model and describe the evolution law of the original system with low-dimensional dynamics. This process is actually the coarse-graining process in the study of causal emergence. There are mainly two types of approximation methods for large-scale dynamical systems, namely approximation methods based on singular value decomposition [64][65] and approximation methods based on Krylov [64][66][67]. The former is based on singular value decomposition, and the latter is based on moment matching. Although the former has many ideal properties, including error bounds, it cannot be applied to systems with high complexity. On the other hand, the advantage of the latter is that it can be implemented iteratively and is therefore suitable for high-dimensional complex systems. Combining the advantages of these two methods gives rise to a third type of approximation method, namely the SVD/Krylov method [68][69]. Both methods evaluate the model reduction effect based on the error loss function of the output function before and after coarse-graining. Therefore, the goal of model reduction is to find the reduced parameter matrix that minimizes the error.

+

In general, the error loss function of the output function before and after model reduction can be used to judge the coarse-graining parameters. This process defaults that the system reduction process will lose information. Therefore, minimizing the error is the only way to judge the effectiveness of the reduction method. However, from the perspective of causal emergence, effective information will increase due to dimensionality reduction. This is also the biggest difference between the coarse-graining strategy in causal emergence research and model reduction in control theory. When the dynamical system is a stochastic system [70], directly calculating the loss function will lead to unstable due to the existence of randomness, so the effectiveness of reduction cannot be accurately measured. The effective information and causal emergence index based on stochastic dynamical systems can increase the effectiveness of evaluation indicators to a certain extent and make the control research of stochastic dynamical systems more rigorous.

+

===Dynamic mode decomposition===

+

In addition to the reduction of dynamical models, dynamic mode decomposition is also closely related to coarse-graining. The basic idea of the dynamic mode decomposition (DMD) [71][72] model is to directly obtain the dynamic information of the flow in the flow field from the data and find the data mapping according to the flow field changes of different frequencies. This method is based on transforming nonlinear infinite-dimensional dynamics into finite-dimensional linear dynamics, and adopts the ideas of Arnoldi method and singular value decomposition for dimensionality reduction. It draws on many key features of time series such as ARIMA, SARIMA and seasonal models, and is widely used in fields such as mathematics, physics, and finance [73]. Dynamic mode decomposition sorts the system according to frequency and extracts the eigenfrequency of the system to observe the contribution of flow structures of different frequencies to the flow field. At the same time, the dynamic mode decomposition modal eigenvalue can predict the flow field. Because the dynamic mode decomposition algorithm has the advantages of theoretical rigor, stability, and simplicity. While being continuously applied, the dynamic mode decomposition algorithm is also continuously improved on the original basis. For example, it is combined with the SPA test to verify the strong effectiveness of the stock price prediction comparison benchmark point and by connecting the dynamic mode decomposition algorithm and spectral research. The way to simulate the vibration mode of the stock market in the circular economy. These applications can effectively collect and analyze data and finally obtain results.

+

Dynamic mode decomposition is a method of reducing the dimension of variables, dynamics, and observation functions simultaneously by using linear transformation [74]. This method is another method similar to the coarse-graining strategy in causal emergence, which takes minimizing error as the main goal for optimization. Although both model reduction and dynamic mode decomposition are very close to model coarse-graining, they are not optimized based on effective information. In essence, they both default to a certain degree of information loss and will not enhance causal effects. In the literature [75], the authors proved that in fact the error minimization solution set contains the optimal solution set of maximizing effective information. Therefore, if we want to optimize causal emergence, we can first minimize the error and find the best coarse-graining strategy in the error minimization solution set.

+

===Simplification of Markov chains===

+

The simplification of Markov chains (or called coarse-graining of Markov chains) is also importantly related to causal emergence. The coarse-graining process in causal emergence is essentially the simplification of Markov chains. Model simplification of Markov processes [76] is an important problem in state transition system modeling. It reduces the complexity of Markov chains by merging multiple states into one state.

+

There are mainly three meanings of simplification. First, when we study a very large-scale system, we will not pay attention to the changes of each microscopic state. Therefore, in coarse-graining, we hope to filter out some noise and heterogeneity that we are not interested in, and summarize some mesoscale or macroscopic laws from the microscopic scale. Second, some state transition probabilities are very similar, so they can be regarded as the same kind of state. Clustering this kind of state (also called partitioning the state) to obtain a new smaller Markov chain can reduce the redundancy of system representation. Third, in reinforcement learning using Markov decision processes, coarse-graining the Markov chain can reduce the size of the state space and improve training efficiency. In many literatures, coarse-graining and dimension reduction are equivalent [77].

+

Among them, there are two types of coarse-graining of the state space: hard partitioning and soft partitioning. Soft partitioning can be regarded as a process of breaking up the microscopic state and reconstructing some macroscopic states, and allowing the superposition of microscopic states to obtain macroscopic states. Hard partitioning is a strict grouping of microscopic states, dividing several microscopic states into one group without allowing overlap and superposition (see coarse-graining of Markov chains).

+

The coarse-graining of Markov chains not only needs to be done on the state space, but also on the transition matrix, that is, to simplify the original transition matrix according to the state grouping to obtain a new smaller transition matrix. In addition, the state vector needs to be reduced. Therefore, a complete coarse-graining process needs to consider the coarse-graining of state, transition matrix, and state vector at the same time. Thus, this leads to a new problem, that is, how should the transition probability in the new Markov chain obtained by state grouping be calculated? At the same time, can the normalization condition be guaranteed?

+

In addition to these basic guarantees, we usually also require that the coarse-graining operation of the transition matrix should be commutative with the transition matrix. This condition can ensure that the one-step evolution of the state vector after coarse-graining through the coarse-grained transition matrix (equivalent to macroscopic dynamics) is equivalent to first performing one-step transition matrix evolution on the state vector (equivalent to microscopic dynamics) and then performing coarse-graining. This puts forward requirements for both the state grouping (the coarse-graining process of the state) and the coarse-graining process of the transition matrix. This requirement of commutativity leads people to propose the requirement of clustering property of Markov chains.

+

For any hard partition of states, we can define the so-called concept of lumpability. Lumpability is a measure of clustering. This concept first appeared in Kemeny, Snell's Finite Markov Chains in 1969 [78]. Lumpability is a mathematical condition used to judge "whether a certain hard-blocked microscopic state grouping scheme is reducible to the microscopic state transition matrix". No matter which hard-blocking scheme the state space is classified according to, it has a corresponding coarse-graining scheme for the transition matrix and probability space [79].

+

Suppose a grouping method '''<math>A=\{A_1, A_2,...,A_r\}</math>''' is given for the Markov state space '''<math>A</math>'''. Here [math]A_i[/math] is any subset of the state space '''<math>A</math>''' and satisfies [math]A_i\cap A_j= \Phi[/math], where [math]\Phi[/math] represents the empty set. [math]\displaystyle{ s^{(t)} }[/math] represents the microscopic state of the system at time [math]\displaystyle{ t }[/math]. The microscopic state space is [math]\displaystyle{ S=\{s_1, s_2,...,s_n\} }[/math] and the microscopic state '''<math>s_i\in A</math>''' are all independent elements in the Markov state space. Let the transition probability from microscopic state <math>s_k</math> to <math>s_m</math> be <math>p_{s_k \rightarrow s_m} = p(s^{(t)} = s_m | s^{(t-1)} = s_k)</math>, and the transition probability from microscopic state <math>s_k</math> to macroscopic state <math>A_i</math> be <math>p_{s_k \rightarrow A_i} = p(s^{(t)} \in A_i | s^{(t-1)} = s_k)</math>. Then the necessary and sufficient condition for lumpability is that for any pair <math>A_i, A_j</math>, <math>p_{s_k \rightarrow A_j}</math> of every state <math>s_k</math> belonging to <math>A_i</math> is equal, that is {{NumBlk|:|

+

<math>

+

\begin{aligned}

+

p_{s_k \rightarrow A_j} = \sum_{s_m \in A_j} p_{s_k \rightarrow s_m} = p_{A_i \rightarrow A_j}, \forall s_k \in A_i

+

\end{aligned}

+

</math>

+

|{{EquationRef|4}}}}. For specific methods of coarse-graining Markov chains, please refer to coarse-graining of Markov chains.

Complexivist Ran

150

个编辑

更改

Causal Emergence (查看源代码)

2024年10月30日 (三) 09:08的版本

导航菜单

搜索