Causal Emergence

Causal emergence refers to a special kind of emergence phenomenon in dynamical systems, that is, the system will exhibit stronger causal characteristics on a macroscopic scale. In particular, for a class of Markov dynamical systems, after appropriate coarse-graining of its state space, the formed macroscopic dynamics will exhibit stronger causal characteristics than the microscopic ones. Then it is said that causal emergence occurs in this system [1][2]. At the same time, the causal emergence theory is also a theory that uses causal effect measures to quantify emergence phenomena in complex systems.

1. History

Development of related concepts

The causal emergence theory is a theory that attempts to answer the question of what emergence is from a phenomenological perspective using a causal-based quantitative research method. Therefore, the development of causal emergence is closely related to people's understanding and development of concepts such as emergence and causality.

Emergence

Emergence has always been an important characteristic in complex systems and a core concept in many discussions about system complexity and the relationship between the macroscopic and microscopic levels [3][4]. Emergence can be simply understood as the whole being greater than the sum of its parts, that is, the whole exhibits new characteristics that the individuals constituting it do not possess [5]. Although scholars have pointed out the existence of emergence phenomena in various fields [4][6], such as the collective behavior of birds [7], the formation of consciousness in the brain, and the emergent capabilities of large language models [8], there is currently no universally accepted unified understanding of this phenomenon. Previous research on emergence mostly stayed at the qualitative stage. For example, Bedau et al. [9][10] conducted classified research on emergence, dividing emergence into nominal emergence [11][12], weak emergence [9][13], and strong emergence [14][15].

Nominal emergence can be understood as attributes and patterns that can be possessed by the macroscopic level but not by the microscopic level. For example, the shape of a circle composed of several pixels is a kind of nominal emergence [11][12].

Weak emergence refers to the fact that macroscopic-level attributes or processes are generated by complex interactions between individual components. Or weak emergence can also be understood as a characteristic that can be simulated by a computer in principle. Due to the principle of computational irreducibility, even if weak emergence characteristics can be simulated, they still cannot be easily reduced to microscopic-level attributes. For weak emergence, the causes of its pattern generation may come from both microscopic and macroscopic levels [14][15]. Therefore, the causal relationship of emergence may coexist with microscopic causal relationships.

As for strong emergence, there are many controversies. It refers to macroscopic-level attributes that cannot be reduced to microscopic-level attributes in principle, including the interactions between individuals. In addition, Jochen Fromm further interprets strong emergence as the causal effect of downward causation [16]. Downward causation refers to the causal force from the macroscopic level to the microscopic level. However, there are many controversies about the concept of downward causation itself [17][18].

From these early studies, it can be seen that emergence has a natural and profound connection with causality.

Causality and its measurement

The so-called causality refers to the mutual influence between events. Causality is not equal to correlation, which is manifested in that not only will B occur when A occurs, but also if A does not occur, then B will not occur. Only by intervening in event A and then examining the result of B can people detect whether there is a causal relationship between A and B.

With the further development of causal science in recent years, people can use a mathematical framework to quantify causality. Causality describes the causal effect of a dynamical process [19][20][21]. Judea Pearl [21] uses probabilistic graphical models to describe causal interactions. Pearl uses different models to distinguish and quantify three levels of causality. Here we are more concerned with the second level in the causal ladder: intervening in the input distribution. In addition, due to the uncertainty and ambiguity behind the discovered causal relationships, measuring the degree of causal effect between two variables is another important issue. Many independent historical studies have addressed the issue of measuring causal relationships. These measurement methods include Hume's concept of constant connection [22] and value function-based methods [23], Eells and Suppes' probabilistic causal measures [24][25], and Judea Pearl's causal measure indicators, etc. [19].

Causal emergence

Emergence and causality are also interconnected: on the one hand, emergence is the causal effect of complex nonlinear interactions among the components of a complex system; on the other hand, emergent characteristics will also have causal relationships with individuals in complex systems. In addition, in the past, people were accustomed to attributing macroscopic factors to the influence of microscopic factors. However, macroscopic emergent patterns often cannot find microscopic attributions, so corresponding causes cannot be found. Thus, there is a profound connection between emergence and causality. On the other hand, although we have a qualitative classification of emergence, we cannot quantitatively characterize the occurrence of emergence. Therefore, we can use causality to quantitatively characterize the occurrence of emergence.

In 2013, Erik Hoel, an American theoretical neurobiologist, tried to introduce causality into the measurement of emergence, proposed the concept of causal emergence, and used effective information (EI for short) to quantify the strength of causality in system dynamics [1][2]. Causal emergence can be described as: when a system has a stronger causal effect on a macroscopic scale compared to its microscopic scale, causal emergence occurs. Causal emergence well characterizes the differences and connections between the macroscopic and microscopic states of a system. At the same time, it combines the two core concepts of causality in artificial intelligence and emergence in complex systems. Causal emergence also provides scholars with a quantitative perspective to answer a series of philosophical questions. For example, the top-down causal characteristics in life systems or social systems can be discussed with the help of the causal emergence framework. The top-down causation here refers to downward causation [26], indicating the existence of macroscopic-to-microscopic causal effects. For example, in the phenomenon of a gecko breaking its tail. When in danger, the gecko breaks its tail directly without asking for the tail's advice. Here, the whole is the cause and the tail is the effect. Then there is a causal force from the whole pointing to the part.

Early work on quantifying emergence

There have been some related works in the early stage that attempted to quantitatively analyze emergence. The computational mechanics theory proposed by Crutchfield et al. [27] considers causal states. This method discusses related concepts based on the division of state space and is very similar to Erik Hoel's causal emergence theory. On the other hand, Seth et al. proposed the G-emergence theory [28] to quantify emergence by using Granger causality. Computational mechanics The computational mechanics theory attempts to express the causal laws of emergence in a quantitative framework, that is, how to construct a coarse-grained causal model from a random process so that this model can generate the time series of the observed random process [27].

Here, the random process can be represented by [math]\displaystyle{ \overleftrightarrow{s} }[/math]. Based on time [math]\displaystyle{ t }[/math], the random process can be divided into two parts: the process before time [math]\displaystyle{ t }[/math] and the process after time [math]\displaystyle{ t }[/math], [math]\displaystyle{ \overleftarrow{s_t} }[/math] and [math]\displaystyle{ \overrightarrow{s_t} }[/math]. Computational mechanics records the set of all possible historical processes [math]\displaystyle{ \overleftarrow{s_t} }[/math] as [math]\displaystyle{ \overleftarrow{S} }[/math], and the set of all future processes as [math]\displaystyle{ \overrightarrow{S} }[/math].

The goal of computational mechanics is to establish a model that hopes to reconstruct and predict the observed random sequence in a certain degree of accuracy. However, the randomness of the sequence makes it impossible for us to obtain a perfect reconstruction. Therefore, we need a coarse-grained mapping to capture the ordered structure in the random sequence. This coarse-grained mapping can be characterized by a partitioning function [math]\displaystyle{ \eta: \overleftarrow{S}→\mathcal{R} }[/math], which can divide [math]\displaystyle{ \overleftarrow{S} }[/math] into several mutually exclusive subsets (all mutually exclusive subsets form the complete set), and the formed set is denoted as [math]\displaystyle{ \mathcal{R} }[/math].

Computational mechanics regards any subset [math]\displaystyle{ R \in \mathcal{R} }[/math] as a macroscopic state. For a set of macroscopic state sets [math]\displaystyle{ \mathcal{R} }[/math], computational mechanics uses Shannon entropy to define its statistical complexity index [math]\displaystyle{ C_\mu }[/math] to measure the complexity of the state. The following equation:

[math]\displaystyle{ C_\mu(\mathcal{R})\triangleq -\sum_{\rho\in \mathcal{R}} P(\mathcal{R}=\rho)\log_2 P(\mathcal{R}=\rho) }[/math]

It can be proved that when a set of states is used to build a prediction model, statistical complexity is approximately equivalent to the size of the prediction model.

Furthermore, in order to achieve the best balance between predictability and simplicity for the set of macroscopic states, computational mechanics defines the concept of causal equivalence. If [math]\displaystyle{ P\left ( \overrightarrow{s}|\overleftarrow{s}\right )=P\left ( \overrightarrow{s}|{\overleftarrow{s}}'\right ) }[/math], then [math]\displaystyle{ \overleftarrow{s} }[/math] and [math]\displaystyle{ {\overleftarrow{s}}' }[/math] are causally equivalent. This equivalence relation can divide all historical processes into equivalence classes and define them as causal states. All causal states of the historical process [math]\displaystyle{ \overleftarrow{s} }[/math] can be characterized by a mapping [math]\displaystyle{ \epsilon \left ( \overleftarrow{s} \right ) }[/math]. Here, [math]\displaystyle{ \epsilon: \overleftarrow{\mathcal{S}}\rightarrow 2^{\overleftarrow{\mathcal{S}}} }[/math] is a function that maps the historical process [math]\displaystyle{ \overleftarrow{s} }[/math] to the causal state [math]\displaystyle{ \epsilon(\overleftarrow{s})\in 2^{\overleftarrow{\mathcal{S}}} }[/math].

Further, we can denote the causal transition probability between two causal states [math]\displaystyle{ S_i }[/math] and [math]\displaystyle{ S_j }[/math] as [math]\displaystyle{ T_{ij}^{\left ( s \right )} }[/math], which is similar to a coarsened macroscopic dynamics. The [math]\displaystyle{ \epsilon }[/math]-machine of a random process is defined as an ordered pair [math]\displaystyle{ \left \{ \epsilon,T \right \} }[/math]. This is a pattern discovery machine that can achieve prediction by learning the [math]\displaystyle{ \epsilon }[/math] and [math]\displaystyle{ T }[/math] functions. This is equivalent to defining the so-called identification problem of emergent causality. Here, the [math]\displaystyle{ \epsilon }[/math]-machine is a machine that attempts to discover emergent causality in data.

Computational mechanics can prove that the causal states obtained through the [math]\displaystyle{ \epsilon }[/math]-machine have three important characteristics: "maximum predictability", "minimum statistical complexity", and "minimum randomness", and it is verified that it is optimal in a certain sense. In addition, the author introduces a hierarchical machine reconstruction algorithm that can calculate causal states and [math]\displaystyle{ \epsilon }[/math]-machines from observational data. Although this algorithm may not be applicable to all scenarios, the author takes chaotic dynamics, hidden Markov models, and cellular automata as examples and gives numerical calculation results and corresponding machine reconstruction paths.

Although the original computational mechanics does not give a clear definition and quantitative theory of emergence, some researchers later further advanced the development of this theory. Shalizi et al. discussed the relationship between computational mechanics and emergence in their work. If process [math]\displaystyle{ {\overleftarrow{s}}' }[/math] has higher prediction efficiency than process [math]\displaystyle{ \overleftarrow{s} }[/math], then emergence occurs in process [math]\displaystyle{ {\overleftarrow{s}}' }[/math]. The prediction efficiency [math]\displaystyle{ e }[/math] of a process is defined as the ratio of its excess entropy to its statistical complexity ([math]\displaystyle{ e=\frac{E}{C_{\mu}} }[/math]). [math]\displaystyle{ e }[/math] is a real number between 0 and 1. We can regard it as a part of the historical memory stored in the process. In two cases, [math]\displaystyle{ C_{\mu}=0 }[/math]. One is that this process is completely uniform and deterministic; the other is that it is independently and identically distributed. In both cases, there cannot be any interesting predictions, so we set [math]\displaystyle{ e=0 }[/math]. At the same time, the author explains that emergence can be understood as a dynamical process in which a pattern gains the ability to adapt to different environments.

The causal emergence framework has many similarities with computational mechanics. All historical processes [math]\displaystyle{ \overleftarrow{s} }[/math] can be regarded as microscopic states. All [math]\displaystyle{ R \in \mathcal{R} }[/math] correspond to macroscopic states. The function [math]\displaystyle{ \eta }[/math] can be understood as a possible coarse-graining function. The causal state [math]\displaystyle{ \epsilon \left ( \overleftarrow{s} \right ) }[/math] is a special state that can at least have the same predictive power as the microscopic state [math]\displaystyle{ \overleftarrow{s} }[/math]. Therefore, [math]\displaystyle{ \epsilon }[/math] can be understood as an effective coarse-graining strategy. Causal transfer [math]\displaystyle{ T }[/math] corresponds to effective macroscopic dynamics. The characteristic of minimum randomness characterizes the determinism of macroscopic dynamics and can be measured by effective information in causal emergence.