Causal Emergence

Causal emergence refers to a special kind of emergence phenomenon in dynamical systems, that is, the system will exhibit stronger causal characteristics on a macroscopic scale. In particular, for a class of Markov dynamical systems, after appropriate coarse-graining of its state space, the formed macroscopic dynamics will exhibit stronger causal characteristics than the microscopic ones. Then it is said that causal emergence occurs in this system [1][2]. At the same time, the causal emergence theory is also a theory that uses causal effect measures to quantify emergence phenomena in complex systems.

1. History

Development of related concepts

The causal emergence theory is a theory that attempts to answer the question of what emergence is from a phenomenological perspective using a causal-based quantitative research method. Therefore, the development of causal emergence is closely related to people's understanding and development of concepts such as emergence and causality.

Emergence

Emergence has always been an important characteristic in complex systems and a core concept in many discussions about system complexity and the relationship between the macroscopic and microscopic levels [3][4]. Emergence can be simply understood as the whole being greater than the sum of its parts, that is, the whole exhibits new characteristics that the individuals constituting it do not possess [5]. Although scholars have pointed out the existence of emergence phenomena in various fields [4][6], such as the collective behavior of birds [7], the formation of consciousness in the brain, and the emergent capabilities of large language models [8], there is currently no universally accepted unified understanding of this phenomenon. Previous research on emergence mostly stayed at the qualitative stage. For example, Bedau et al. [9][10] conducted classified research on emergence, dividing emergence into nominal emergence [11][12], weak emergence [9][13], and strong emergence [14][15].

Nominal emergence can be understood as attributes and patterns that can be possessed by the macroscopic level but not by the microscopic level. For example, the shape of a circle composed of several pixels is a kind of nominal emergence [11][12].

Weak emergence refers to the fact that macroscopic-level attributes or processes are generated by complex interactions between individual components. Or weak emergence can also be understood as a characteristic that can be simulated by a computer in principle. Due to the principle of computational irreducibility, even if weak emergence characteristics can be simulated, they still cannot be easily reduced to microscopic-level attributes. For weak emergence, the causes of its pattern generation may come from both microscopic and macroscopic levels [14][15]. Therefore, the causal relationship of emergence may coexist with microscopic causal relationships.

As for strong emergence, there are many controversies. It refers to macroscopic-level attributes that cannot be reduced to microscopic-level attributes in principle, including the interactions between individuals. In addition, Jochen Fromm further interprets strong emergence as the causal effect of downward causation [16]. Downward causation refers to the causal force from the macroscopic level to the microscopic level. However, there are many controversies about the concept of downward causation itself [17][18].

From these early studies, it can be seen that emergence has a natural and profound connection with causality.

Causality and its measurement

The so-called causality refers to the mutual influence between events. Causality is not equal to correlation, which is manifested in that not only will B occur when A occurs, but also if A does not occur, then B will not occur. Only by intervening in event A and then examining the result of B can people detect whether there is a causal relationship between A and B.

With the further development of causal science in recent years, people can use a mathematical framework to quantify causality. Causality describes the causal effect of a dynamical process [19][20][21]. Judea Pearl [21] uses probabilistic graphical models to describe causal interactions. Pearl uses different models to distinguish and quantify three levels of causality. Here we are more concerned with the second level in the causal ladder: intervening in the input distribution. In addition, due to the uncertainty and ambiguity behind the discovered causal relationships, measuring the degree of causal effect between two variables is another important issue. Many independent historical studies have addressed the issue of measuring causal relationships. These measurement methods include Hume's concept of constant connection [22] and value function-based methods [23], Eells and Suppes' probabilistic causal measures [24][25], and Judea Pearl's causal measure indicators, etc. [19].

Causal emergence

Emergence and causality are also interconnected: on the one hand, emergence is the causal effect of complex nonlinear interactions among the components of a complex system; on the other hand, emergent characteristics will also have causal relationships with individuals in complex systems. In addition, in the past, people were accustomed to attributing macroscopic factors to the influence of microscopic factors. However, macroscopic emergent patterns often cannot find microscopic attributions, so corresponding causes cannot be found. Thus, there is a profound connection between emergence and causality. On the other hand, although we have a qualitative classification of emergence, we cannot quantitatively characterize the occurrence of emergence. Therefore, we can use causality to quantitatively characterize the occurrence of emergence.

In 2013, Erik Hoel, an American theoretical neurobiologist, tried to introduce causality into the measurement of emergence, proposed the concept of causal emergence, and used effective information (EI for short) to quantify the strength of causality in system dynamics [1][2]. Causal emergence can be described as: when a system has a stronger causal effect on a macroscopic scale compared to its microscopic scale, causal emergence occurs. Causal emergence well characterizes the differences and connections between the macroscopic and microscopic states of a system. At the same time, it combines the two core concepts of causality in artificial intelligence and emergence in complex systems. Causal emergence also provides scholars with a quantitative perspective to answer a series of philosophical questions. For example, the top-down causal characteristics in life systems or social systems can be discussed with the help of the causal emergence framework. The top-down causation here refers to downward causation [26], indicating the existence of macroscopic-to-microscopic causal effects. For example, in the phenomenon of a gecko breaking its tail. When in danger, the gecko breaks its tail directly without asking for the tail's advice. Here, the whole is the cause and the tail is the effect. Then there is a causal force from the whole pointing to the part.

Early work on quantifying emergence

There have been some related works in the early stage that attempted to quantitatively analyze emergence. The computational mechanics theory proposed by Crutchfield et al. [27] considers causal states. This method discusses related concepts based on the division of state space and is very similar to Erik Hoel's causal emergence theory. On the other hand, Seth et al. proposed the G-emergence theory [28] to quantify emergence by using Granger causality.

Computational mechanics

The computational mechanics theory attempts to express the causal laws of emergence in a quantitative framework, that is, how to construct a coarse-grained causal model from a random process so that this model can generate the time series of the observed random process [27].

Here, the random process can be represented by [math]\displaystyle{ \overleftrightarrow{s} }[/math]. Based on time [math]\displaystyle{ t }[/math], the random process can be divided into two parts: the process before time [math]\displaystyle{ t }[/math] and the process after time [math]\displaystyle{ t }[/math], [math]\displaystyle{ \overleftarrow{s_t} }[/math] and [math]\displaystyle{ \overrightarrow{s_t} }[/math]. Computational mechanics records the set of all possible historical processes [math]\displaystyle{ \overleftarrow{s_t} }[/math] as [math]\displaystyle{ \overleftarrow{S} }[/math], and the set of all future processes as [math]\displaystyle{ \overrightarrow{S} }[/math].

The goal of computational mechanics is to establish a model that hopes to reconstruct and predict the observed random sequence in a certain degree of accuracy. However, the randomness of the sequence makes it impossible for us to obtain a perfect reconstruction. Therefore, we need a coarse-grained mapping to capture the ordered structure in the random sequence. This coarse-grained mapping can be characterized by a partitioning function [math]\displaystyle{ \eta: \overleftarrow{S}→\mathcal{R} }[/math], which can divide [math]\displaystyle{ \overleftarrow{S} }[/math] into several mutually exclusive subsets (all mutually exclusive subsets form the complete set), and the formed set is denoted as [math]\displaystyle{ \mathcal{R} }[/math].

Computational mechanics regards any subset [math]\displaystyle{ R \in \mathcal{R} }[/math] as a macroscopic state. For a set of macroscopic state sets [math]\displaystyle{ \mathcal{R} }[/math], computational mechanics uses Shannon entropy to define its statistical complexity index [math]\displaystyle{ C_\mu }[/math] to measure the complexity of the state. The following equation:

[math]\displaystyle{ C_\mu(\mathcal{R})\triangleq -\sum_{\rho\in \mathcal{R}} P(\mathcal{R}=\rho)\log_2 P(\mathcal{R}=\rho) }[/math]

It can be proved that when a set of states is used to build a prediction model, statistical complexity is approximately equivalent to the size of the prediction model.

Furthermore, in order to achieve the best balance between predictability and simplicity for the set of macroscopic states, computational mechanics defines the concept of causal equivalence. If [math]\displaystyle{ P\left ( \overrightarrow{s}|\overleftarrow{s}\right )=P\left ( \overrightarrow{s}|{\overleftarrow{s}}'\right ) }[/math], then [math]\displaystyle{ \overleftarrow{s} }[/math] and [math]\displaystyle{ {\overleftarrow{s}}' }[/math] are causally equivalent. This equivalence relation can divide all historical processes into equivalence classes and define them as causal states. All causal states of the historical process [math]\displaystyle{ \overleftarrow{s} }[/math] can be characterized by a mapping [math]\displaystyle{ \epsilon \left ( \overleftarrow{s} \right ) }[/math]. Here, [math]\displaystyle{ \epsilon: \overleftarrow{\mathcal{S}}\rightarrow 2^{\overleftarrow{\mathcal{S}}} }[/math] is a function that maps the historical process [math]\displaystyle{ \overleftarrow{s} }[/math] to the causal state [math]\displaystyle{ \epsilon(\overleftarrow{s})\in 2^{\overleftarrow{\mathcal{S}}} }[/math].

Further, we can denote the causal transition probability between two causal states [math]\displaystyle{ S_i }[/math] and [math]\displaystyle{ S_j }[/math] as [math]\displaystyle{ T_{ij}^{\left ( s \right )} }[/math], which is similar to a coarsened macroscopic dynamics. The [math]\displaystyle{ \epsilon }[/math]-machine of a random process is defined as an ordered pair [math]\displaystyle{ \left \{ \epsilon,T \right \} }[/math]. This is a pattern discovery machine that can achieve prediction by learning the [math]\displaystyle{ \epsilon }[/math] and [math]\displaystyle{ T }[/math] functions. This is equivalent to defining the so-called identification problem of emergent causality. Here, the [math]\displaystyle{ \epsilon }[/math]-machine is a machine that attempts to discover emergent causality in data.

Computational mechanics can prove that the causal states obtained through the [math]\displaystyle{ \epsilon }[/math]-machine have three important characteristics: "maximum predictability", "minimum statistical complexity", and "minimum randomness", and it is verified that it is optimal in a certain sense. In addition, the author introduces a hierarchical machine reconstruction algorithm that can calculate causal states and [math]\displaystyle{ \epsilon }[/math]-machines from observational data. Although this algorithm may not be applicable to all scenarios, the author takes chaotic dynamics, hidden Markov models, and cellular automata as examples and gives numerical calculation results and corresponding machine reconstruction paths.

Although the original computational mechanics does not give a clear definition and quantitative theory of emergence, some researchers later further advanced the development of this theory. Shalizi et al. discussed the relationship between computational mechanics and emergence in their work. If process [math]\displaystyle{ {\overleftarrow{s}}' }[/math] has higher prediction efficiency than process [math]\displaystyle{ \overleftarrow{s} }[/math], then emergence occurs in process [math]\displaystyle{ {\overleftarrow{s}}' }[/math]. The prediction efficiency [math]\displaystyle{ e }[/math] of a process is defined as the ratio of its excess entropy to its statistical complexity ([math]\displaystyle{ e=\frac{E}{C_{\mu}} }[/math]). [math]\displaystyle{ e }[/math] is a real number between 0 and 1. We can regard it as a part of the historical memory stored in the process. In two cases, [math]\displaystyle{ C_{\mu}=0 }[/math]. One is that this process is completely uniform and deterministic; the other is that it is independently and identically distributed. In both cases, there cannot be any interesting predictions, so we set [math]\displaystyle{ e=0 }[/math]. At the same time, the author explains that emergence can be understood as a dynamical process in which a pattern gains the ability to adapt to different environments.

The causal emergence framework has many similarities with computational mechanics. All historical processes [math]\displaystyle{ \overleftarrow{s} }[/math] can be regarded as microscopic states. All [math]\displaystyle{ R \in \mathcal{R} }[/math] correspond to macroscopic states. The function [math]\displaystyle{ \eta }[/math] can be understood as a possible coarse-graining function. The causal state [math]\displaystyle{ \epsilon \left ( \overleftarrow{s} \right ) }[/math] is a special state that can at least have the same predictive power as the microscopic state [math]\displaystyle{ \overleftarrow{s} }[/math]. Therefore, [math]\displaystyle{ \epsilon }[/math] can be understood as an effective coarse-graining strategy. Causal transfer [math]\displaystyle{ T }[/math] corresponds to effective macroscopic dynamics. The characteristic of minimum randomness characterizes the determinism of macroscopic dynamics and can be measured by effective information in causal emergence.

G-emergence

The G-emergence theory was proposed by Seth in 2008 and is one of the earliest studies to quantitatively quantify emergence from a causal perspective [28]. The basic idea is to use nonlinear Granger causality to quantify weak emergence in complex systems.

Specifically, if we use a binary autoregressive model for prediction, when there are only two variables A and B, there are two equations in the autoregressive model, each equation corresponds to one of the variables, and the current value of each variable is composed of its own value and the value of the other variable within a certain time lag range. In addition, the model also calculates residuals. Here, residuals can be understood as prediction errors and can be used to measure the degree of Granger causal effect (called G-causality) of each equation. The degree to which B is a Granger cause (G-cause) of A is calculated by taking the logarithm of the ratio of the two residual variances, one being the residual of A's autoregressive model when B is omitted, and the other being the residual of the full prediction model (including A and B). In addition, the author also defines the concept of G-autonomous, which represents a measure of the extent to which the past values of a time series can predict its own future values. The strength of this autonomous predictive causal effect can be characterized in a similar way to G-causality.

As shown in the above figure, we can judge the occurrence of emergence based on the two basic concepts in the above G-causality (here is the measure of emergence based on Granger causality, denoted as G-emergence). If A is understood as a macroscopic variable and B is understood as a microscopic variable. The conditions for emergence to occur include two: 1) A is G-autonomous with respect to B; 2) B is a G-cause of A. The degree of G-emergence is calculated by multiplying the degree of A's G-autonomous by the degree of B's average G-cause.

The G-emergence theory proposed by Seth is the first attempt to use causal measures to quantify emergence phenomena. However, the causal relationship used by the author is Granger causality, which is not a strict causal relationship. At the same time, the measurement results also depend on the regression method used. In addition, the measurement index of this method is defined according to variables rather than dynamics, which means that the results will depend on the choice of variables. These all constitute the drawbacks of the G-emergence theory.

The causal emergence framework also has similarities with the aforementioned G-emergence. The macroscopic states of both methods need to be manually selected. In addition, it should be noted that some of the above methods for quantitatively quantifying emergence often do not consider true interventionist causality.

Other theories for quantitatively characterizing emergence

In addition, there are some other quantitative theories of emergence. There are mainly two methods that are widely discussed. One is to understand emergence from the process from disorder to order. Moez Mnif and Christian Müller-Schloer [30] use Shannon entropy to measure order and disorder. In the self-organization process, emergence occurs when order increases. The increase in order is calculated by measuring the difference in Shannon entropy between the initial state and the final state. However, the defect of this method is that it depends on the abstract observation level and the initial conditions of the system. To overcome these two difficulties, the authors propose a measurement method compared with the maximum entropy distribution. Inspired by the work of Moez mif and Christian Müller-Schloer, reference [31] suggests using the divergence between two probability distributions to quantify emergence. They understand emergence as an unexpected or unpredictable distribution change based on the observed samples. But this method has disadvantages such as large computational complexity and low estimation accuracy. To solve these problems, reference [32] further proposes an approximate method for estimating density using Gaussian mixture models and introduces Mahalanobis distance to characterize the difference between data and Gaussian components, thus obtaining better results. In addition, Holzer and de Meer [33][34] and others proposed another emergence measurement method based on Shannon entropy. They believe that a complex system is a self-organizing process in which different individuals interact through communication. Then, we can measure emergence according to the ratio between the Shannon entropy measure of all communications between agents and the sum of Shannon entropies as separate sources.

Another method is to understand emergence from the perspective of "the whole is greater than the sum of its parts" [35][36]. This method defines emergence from interaction rules and the states of agents rather than statistically measuring from the totality of the entire system. Specifically, this measure consists of subtracting two terms. The first term describes the collective state of the entire system, while the second term represents the sum of the individual states of all components. This measure emphasizes that emergence arises from the interactions and collective behavior of the system.

Causal emergence theory based on effective information

In history, the first relatively complete and explicit quantitative theory that uses causality to define emergence is the causal emergence theory proposed by Erik Hoel, Larissa Albantakis, and Giulio Tononi [1][2]. This theory defines so-called causal emergence for Markov chains as the phenomenon that the coarsened Markov chain has a greater causal effect strength than the original Markov chain. Here, the causal effect strength is measured by effective information. This indicator is a modification of the mutual information indicator. The main difference is that the state variable at time $t$ is intervened by do-intervention and transformed into a uniform distribution (or maximum entropy distribution). The effective information indicator was proposed by Giulio Tononi as early as 2003 when studying integrated information theory. As Giulio Tononi's student, Erik Hoel applied effective information to Markov chains and proposed the causal emergence theory based on effective information.

Causal Emergence Theory Based on Information Decomposition

In addition, in 2020, Rosas et al. proposed a method based on information decomposition to define causal emergence in systems from an information theory perspective and quantitatively characterize emergence based on synergistic information or redundant information. The so-called information decomposition is a new method to analyze the complex interrelationships of various variables in complex systems. By decomposing information, each partial information is represented by an information atom. At the same time, each partial information is projected into the information atom with the help of an information lattice diagram. Both synergistic information and redundant information can be represented by corresponding information atoms. This method is based on the non-negative decomposition theory of multivariate information proposed by Williams and Beer. In the paper, partial information decomposition (PID) is used to decompose the mutual information between microstates and macrostates. However, the PID framework can only decompose the mutual information between multiple source variables and one target variable. Rosas extended this framework and proposed the integrated information decomposition method [math]\displaystyle{ \Phi ID }[/math].

Resent Work

Barnett et al. [40] proposed the concept of dynamical decoupling by judging the decoupling of macroscopic and microscopic dynamics based on transfer entropy to judge the occurrence of emergence. That is, emergence is characterized as the macroscopic variables and microscopic variables being independent of each other and having no causal relationship, which can also be regarded as a causal emergence phenomenon. In 2024, Zhang Jiang et al. [26] proposed a new causal emergence theory based on singular value decomposition. The core idea of this theory is to point out that the so-called causal emergence is actually equivalent to the emergence of dynamical reversibility. Given the Markov transition matrix of a system, by performing singular value decomposition on it, the sum of the <math>\alpha</math> power of the singular values is defined as the reversibility measure of Markov dynamics (<math>\Gamma_{\alpha}\equiv \sum_{i=1}^N\sigma_i^{\alpha}</math>), where [math]\sigma_i[/math] is the singular value. This index is highly correlated with effective information and can also be used to characterize the causal effect strength of dynamics. According to the spectrum of singular values, this method can directly define the concepts of clear emergence and vague emergence without explicitly defining a coarse-graining scheme.