更改

第424行: 第424行:     
These experiments show that NIS+ can not only identify causal emergence in data, discover emergent macroscopic dynamics and coarse-graining strategies, but also other experiments show that the NIS+ model can also increase the out-of-distribution generalization ability of the model through EI maximization.
 
These experiments show that NIS+ can not only identify causal emergence in data, discover emergent macroscopic dynamics and coarse-graining strategies, but also other experiments show that the NIS+ model can also increase the out-of-distribution generalization ability of the model through EI maximization.
 +
 +
==Applications==
 +
 +
This subsection mainly explains the potential applications of causal emergence in various [[Complex Systems|complex systems]], including: biological systems, [[neural networks]], brain nervous systems, [[artificial intelligence]] ([[causal representation learning]], [[reinforcement learning based on world models]], [[causal model abstraction]]) and some other potential applications (including consciousness research and Chinese classical philosophy).
 +
 +
===Causal emergence in complex networks===
 +
In 2020, Klein and Hoel improved the method of quantifying causal emergence on Markov chains to be applied to [[complex networks]] <ref>Klein B, Hoel E. The emergence of informative higher scales in complex networks[J]. Complexity, 2020, 20201-12.</ref>. The authors defined the [[Markov chain]] in the network with the help of [[random walkers]], placing random walkers on nodes is equivalent to intervening on nodes, and then defining the transition probability matrix between nodes based on the random walk probability. At the same time, the authors establish a connection between [[effective information]] and the connectivity of the network. Connectivity can be characterized by the uncertainty of the weights of the outgoing and incoming edges of nodes. Based on this, the effective information in complex networks is defined. For detailed methods, refer to [[Causal emergence in complex networks]].
 +
 +
The authors conducted experimental comparisons in artificial networks such as [[random network model|random network]] (ER), [[preferential attachment network model]] (PA) and four types of real networks, and found that: for ER networks, the magnitude of effective information only depends on the connection probability <math>p</math>, and as the network size increases, it will converge to the value <math>-\log_2p</math>. At the same time, a key finding shows that there is a phase transition point in the EI value, which approximately appears at the position where the [[average degree]] (<math><k></math>) of the network is equal to <math>\log_2N</math>. This also corresponds to the [[phase transition point]] where the random network structure does not contain more information as its scale increases with the increase of the connection probability. For preferential attachment model networks, when the power-law exponent <math>\alpha<1.0</math> of the network's [[degree distribution]], the magnitude of effective information will increase as the network size increases; when <math>\alpha>1.0</math>, the conclusion is opposite; <math>\alpha = 1.0</math> corresponds exactly to the [[scale-free network]] which is the growing [[critical boundary]]. For real networks, the authors found that biological networks have the lowest effective information because they have a lot of noise. However, we can remove this noise through effective coarse-graining, which makes biological networks show more significant causal emergence phenomena than other types of networks; while technical type networks are sparser and non-degenerate, so they have higher average efficiency, more specific node relationships, and the highest effective information, but it is difficult to increase the causal emergence measure through coarse-graining.
 +
 +
In this article, the authors use the [[greedy algorithm]] to coarse-grain the network. However, for large-scale networks, this algorithm is very inefficient. Subsequently, Griebenow et al. <ref>Griebenow R, Klein B, Hoel E. Finding the right scale of a network: efficient identification of causal emergence through spectral clustering[J]. arXiv preprint arXiv:190807565, 2019.</ref> proposed a method based on [[spectral clustering]] to identify causal emergence in [[preferential attachment networks]]. Compared with the [[greedy algorithm]] and the [[gradient descent algorithm]], the [[spectral clustering algorithm]] has less computation time and the causal emergence of the found macroscopic network is also more significant.
 +
 +
===Application on biological networks===
 +
Furthermore, Klein et al. extended the method of [[causal emergence in complex networks]] to more biological networks. As mentioned earlier, [[biological networks]] have more noise, which makes it difficult for us to understand their internal operating principles. This noise comes from the inherent noise of the system on the one hand, and is introduced by measurement or observation on the other hand. Klein et al. <ref>Klein B, Swain A, Byrum T, et al. Exploring noise, degeneracy and determinism in biological networks with the einet package[J]. Methods in Ecology and Evolution, 2022, 13(4): 799-804.</ref> further explored the relationship and specific meanings among noise, [[degeneracy]] and [[determinism]] in biological networks, and drew some interesting conclusions.
 +
 +
For example, high [[determinism]] in gene expression networks can be understood as one gene almost certainly leading to the expression of another gene. At the same time, high [[degeneracy]] is also widespread in biological systems during evolution. These two factors jointly lead to the fact that it is currently not clear at what scale biological systems should be analyzed to better understand their functions. Klein et al. <ref>Klein B, Hoel E, Swain A, et al. Evolution and emergence: higher order information structure in protein interactomes across the tree of life[J]. Integrative Biology, 2021, 13(12): 283-294.</ref> analyzed [[protein interaction networks]] of more than 1800 species and found that networks at macroscopic scales have less noise and [[degeneracy]]. At the same time, compared with nodes that do not participate in macroscopic scales, nodes in macroscopic scale networks are more resilient. Therefore, in order to meet the requirements of evolution, biological networks need to evolve macroscopic scales to increase certainty to enhance [[network resilience]] and improve the effectiveness of information transmission.
 +
 +
Hoel et al. in the article <ref>Hoel E, Levin M. Emergence of informative higher scales in biological systems: a computational toolkit for optimal prediction and control[J]. Communicative & Integrative Biology, 2020, 13(1): 108-118.</ref> further studied causal emergence in biological systems with the help of effective information theory. The author applied effective information to gene regulatory networks to identify the most informative heart development model to control the heart development of mammals. By quantifying the causal emergence in the largest connected component of the Saccharomyces cerevisiae gene network, the article reveals that informative macroscopic scales are ubiquitous in biology, and that life mechanisms themselves often operate on macroscopic scales. This article also provides biologists with a computable tool to identify the most informative macroscopic scale, and can model, predict, control and understand complex biological systems on this basis.
 +
 +
Swain et al. in the article <ref>Swain A, Williams S D, Di Felice L J, et al. Interactions and information: exploring task allocation in ant colonies using network analysis[J]. Animal Behaviour, 2022, 18969-81.</ref> explored the influence of the interaction history of ant colonies on task allocation and task switching, and used effective information to study how noise spreads among ants. The results found that the degree of historical interaction between ant colonies affects task allocation, and the type of ant colony in specific interactions determines the noise in the interaction. In addition, even when ants switch functional groups, the emergent cohesion of ant colonies can ensure the stability of the colony. At the same time, different functional ant colonies also play different roles in maintaining the cohesion of the colony.
 +
 +
===Application on artificial neural networks===
 +
 +
Marrow et al. in the article <ref>Marrow S, Michaud E J, Hoel E. Examining the Causal Structures of Deep Neural Networks Using Information Theory[J]. Entropy, 2020, 22(12): 1429.</ref> tried to introduce effective information into neural networks to quantify and track the changes in the causal structure of neural networks during the training process. Here, effective information is used to evaluate the degree of causal influence of nodes and edges on downstream targets of each layer. The effective information EI of each layer of neural network is defined as:
 +
 +
<math>
 +
I(L_1;L_2|do(L_1=H^{max}))
 +
</math>
 +
 +
Here, <math>L_1</math> and <math>L_2</math> respectively represent the input and output layers connecting the neural network. Here, the input layer is do-intervened as a uniform distribution as a whole, and then the mutual information between cause and effect is calculated. Effective information can be decomposed into sensitivity and degeneracy. Here, sensitivity is defined as:
 +
 +
<math>
 +
\sum_{(i \in L_1,j \in L_2)}I(t_i;t_j|do(i=H^{max}))
 +
</math>
 +
 +
Here, i and j respectively represent any neuron combination in the input layer and output layer. <math>t_i</math> and <math>t_j</math> respectively represent the state combinations of neurons in the input and output layers after intervening i as the maximum entropy distribution under the condition that the neural network mechanism remains unchanged. That is to say, if the input neuron i is intervened to be a uniform distribution, the output neuron will also change. Then this value measures the mutual information between the two.
 +
 +
Here, it should be distinguished from the definition of effective information. Here, each neuron in the input layer is do-intervened separately, and then the mutual information calculated by each two neurons is accumulated as the definition of sensitivity. Degeneracy is obtained by the difference between effective information and sensitivity and is defined as:
 +
 +
<math>
 +
I(L_1;L_2|do(L_1=H^{max}))-\sum_{(i \in L_1,j \in L_2)}I(t_i;t_j|do(i=H^{max}))
 +
</math>.
 +
 +
By observing the effective information during the model training process, including the changes of sensitivity and degeneracy, we can know the generalization ability of the model, thereby helping scholars better understand and explain the working principle of neural networks.
67

个编辑