第468行: |
第468行: |
| | | |
| By observing the effective information during the model training process, including the changes of sensitivity and degeneracy, we can know the generalization ability of the model, thereby helping scholars better understand and explain the working principle of neural networks. | | By observing the effective information during the model training process, including the changes of sensitivity and degeneracy, we can know the generalization ability of the model, thereby helping scholars better understand and explain the working principle of neural networks. |
| + | |
| + | ===Application on the brain nervous system=== |
| + | The brain nervous system is an emergent multi-scale complex system. Luppi et al. [54] Based on integrated information decomposition, the synergistic workspace of human consciousness is revealed. The authors constructed a three-layer architecture of brain cognition, including: external environment, specific modules and synergistic global space. The working principle of the brain mainly includes three stages: the first stage is responsible for collecting information from multiple different modules into the workspace; the second stage is responsible for integrating the collected information in the workspace; the third stage is responsible for broadcasting global information to other parts of the brain. The authors conducted experiments on three types of fMRI data in different resting states, including 100 normal people, 15 subjects participating in anesthesia experiments (including three different states before anesthesia, during anesthesia and recovery), and 22 subjects with chronic disorders of consciousness (DOC). This article uses integrated information decomposition to obtain synergistic information and redundant information, and uses the revised integrated information value <math>\Phi_R</math> to calculate the synergy and redundancy values between each two brain regions, so as to obtain whether the factor that each brain region plays a greater role is synergy or redundancy. At the same time, by comparing the data of conscious people, they found that the regions where the integrated information of unconscious people was significantly reduced all belonged to the brain regions where synergistic information played a greater role. At the same time, they found that the regions where the integrated information was significantly reduced all belonged to functional regions such as DMN (Default Mode Network), thus locating the brain regions that have a significant effect on the occurrence of consciousness. |
| + | |
| + | ===Application in artificial intelligence systems=== |
| + | |
| + | The causal emergence theory also has a very strong connection with the field of artificial intelligence. This is manifested in the following ways. First, the machine learning solution to the causal emergence identification problem is actually an application of causal representation learning. Second, technologies such as maximizing effective information are also expected to be applied to fields such as causal machine learning. |
| + | |
| + | ====Causal representation learning==== |
| + | Causal representation learning is an emerging field in artificial intelligence. It attempts to combine two important fields in machine learning: representation learning and causal inference, and tries to combine their respective advantages to automatically extract important features and causal relationships behind the data [55]. Causal emergence identification based on effective information can be equivalent to a causal representation learning task. Identifying the emergence of causal relationships from data is equivalent to learning the underlying potential causal relationships and causal mechanisms of the data. Specifically, we can regard the macroscopic state as a causal variable, the macroscopic dynamics as a causal mechanism by analogy, the coarse-graining strategy can be regarded as an encoding process or representation from the original data to the causal variable, and the effective information can be understood as a measure of the causal effect strength of the mechanism. |
| + | |
| + | Since there are many similarities between the two, the techniques and concepts of the two fields can learn from each other. For example, causal representation learning technology can be applied to causal emergence identification. In turn, the learned abstract causal representation can be interpreted as a macroscopic state, thereby enhancing the interpretability of causal representation learning. However, there are also significant differences between the two, mainly including two points: 1) Causal representation learning assumes that there is a real causal mechanism behind it, and the data is generated by this causal mechanism. However, there may not be a "true causal relationship" between the states and dynamics emerging at the macroscopic level; 2) The macroscopic state after coarse-graining in causal emergence is a low-dimensional description, but there is no such requirement in causal representation learning. From an epistemological perspective, there is no difference between the two, because both are extracting effective information from observational data to obtain representations with stronger causal effects. |
| + | |
| + | To better compare causal representation learning and causal emergence identification tasks, we list the following table: |
| + | |
| + | {| class="wikitable" style="text-align:center;" |
| + | |+Comparison of causal representation learning and causal emergence identification |
| + | |- |
| + | !Comparison!!Causal representation learning!!Causal emergence identification |
| + | |- |
| + | |'''Data'''||Original data macroscopic states generated by certain causal mechanisms in real life||Observations of microscopic states (time series) |
| + | |- |
| + | ||'''Latent variable'''||Causal representation||Macroscopic state |
| + | |- |
| + | ||'''Causal mechanism'''||Causal mechanism||Macroscopic dynamics |
| + | |- |
| + | |'''Mapping between data and latent variables'''||Representation||Coarse-graining function |
| + | |- |
| + | |'''Causal relationship optimization'''||Prediction loss, disentanglement||EI maximization |
| + | |- |
| + | |'''Goal'''||Finding the optimal representation of the original data to ensure that an independent causal mechanism can be achieved through the representation||Finding an effective coarse-graining strategy and macroscopic dynamics with strong causal effects |
| + | |} |
| + | |
| + | ====Application of effective information in causal machine learning==== |
| + | |
| + | Causal emergence can enhance the performance of machine learning in out-of-distribution scenarios. The do-intervention introduced in <math>EI</math> captures the causal dependence in the data generation process and suppresses spurious correlations, thus supplementing machine learning algorithms based on associations and establishing a connection between <math>EI</math> and out-of-distribution generalization (Out Of Distribution, abbreviated as OOD) [56]. Due to the universality of effective information, causal emergence can be applied to supervised machine learning to evaluate the strength of the causal relationship between the feature space <math>X</math> and the target space <math>Y</math>, thereby improving the prediction accuracy from cause (feature) to result (target). It is worth noting that direct fitting of observations from <math>X</math> to <math>Y</math> is sufficient for common prediction tasks with the i.i.d. assumption, which means that the training data and test data are independently and identically distributed. However, if samples are drawn from outside the training distribution, a generalization representation space from training to test environments must be learned. Since it is generally believed that the generalization of causality is better than statistical correlation [57], therefore, the causal emergence theory can serve as a standard for embedding causal relationships in the representation space. The occurrence of causal emergence reveals the potential causal factors of the target, thereby producing a robust representation space for out-of-distribution generalization. Causal emergence may provide a unified representation measure for out-of-distribution generalization based on causal theory. <math>EI</math> can also be regarded as an information-theoretic abstraction of the out-of-distribution generalization's reweighting-based debiasing technique. In addition, we conjecture that out-of-distribution generalization can be achieved while maximizing <math>EI</math>, and <math>EI</math> may reach its peak at the intermediate stage of the original feature abstraction, which is consistent with the idea of OOD generalization, that is, less is more. Ideally, when causal emergence occurs at the peak of <math>EI</math>, all non-causal features are excluded and causal features are revealed, resulting in the most informative representation. |
| + | |
| + | |
| + | =====Causal model abstraction===== |
| + | In complex systems, since microscopic states often have noise, people need to coarse-grain microscopic states to obtain macroscopic states with less noise, so that the causality of macroscopic dynamics is stronger. The same is true for causal models that explain various types of data. Due to the excessive complexity of the original model or limited computing resources, people often need to obtain a more abstract causal model and ensure that the abstract model maintains the causal mechanism of the original model as much as possible. This is the so-called causal model abstraction. |
| + | |
| + | Causal model abstraction belongs to a subfield of artificial intelligence and plays an important role especially in causal inference and model interpretability. This abstraction can help us better understand the hidden causal mechanisms in the data and the interactions between variables. Causal model abstraction is achieved by evaluating the optimization of a high-level model to simulate the causal effects of a low-level model as much as possible [58]. If a high-level model can generalize the causal effects of a low-level model, we call this high-level model a causal abstraction of the low-level model. |
| + | |
| + | Causal model abstraction also discusses the interaction between causal relationships and model abstraction (which can be regarded as a coarse-graining process) [59]. Therefore, causal emergence identification and causal model abstraction have many similarities. The original causal mechanism can be understood as microscopic dynamics, and the abstracted mechanism can be understood as macroscopic dynamics. In the neural information compression framework (NIS), researchers place restrictions on coarse-graining strategies and macroscopic dynamics, requiring that the microscopic prediction error of macroscopic dynamics be small enough to exclude trivial solutions. This requirement is also similar to causal model abstraction, which hopes that the abstracted causal model is as similar as possible to the original model. However, there are also some differences between the two: 1) Causal emergence identification is to coarse-grain states or data, while causal model abstraction is to perform coarse-graining operations on models; 2) Causal model abstraction considers confounding factors, but this point is ignored in the discussion of causal emergence identification. |
| + | |
| + | =====Reinforcement learning based on world models===== |
| + | |
| + | Reinforcement learning based on world models assumes that there is a world model inside the reinforcement learning agent, so that it can simulate the dynamics of the environment faced by the intelligent agent [60]. The dynamics of the world model can be learned through the interaction between the agent and the environment, thereby helping the agent to plan and make decisions in an uncertain environment. At the same time, in order to represent a complex environment, the world model must be a coarse-grained description of the environment. A typical world model architecture always contains an encoder and a decoder. |
| + | |
| + | Reinforcement learning based on world models also has many similarities with causal emergence identification. The world model can also be regarded as a macroscopic dynamics. All states in the environment can be regarded as macroscopic states. These can be regarded as compressed states that ignore irrelevant information and can capture the most important causal features in the environment so that the agent can make better decisions. In the planning process, the agent can also use the world model to simulate the dynamics of the real world. |
| + | |
| + | The similarities and common features between the two fields can help us borrow ideas and techniques from one field to another. For example, an agent with a world model can interact with a complex system as a whole and obtain emergent causal laws from the interaction, thereby better helping us with the task of causal emergence identification. In turn, maximizing effective information technology can also be used in reinforcement learning to make the world model have stronger causal characteristics. |