第567行: |
第567行: |
| | | |
| ==Applications== | | ==Applications== |
− | This subsection mainly explains the potential applications of causal emergence in various complex systems, including: biological systems, neural networks, brain nervous systems, artificial intelligence (causal representation learning, reinforcement learning based on world models, causal model abstraction) and some other potential applications (including consciousness research and Chinese classical philosophy). | + | This subsection mainly explains the potential applications of causal emergence in various [[complex systems]], including: biological systems, [[neural networks]], brain nervous systems, [[artificial intelligence]] ([[causal representation learning]], [[reinforcement learning based on world models]], [[causal model abstraction]]) and some other potential applications (including consciousness research and Chinese classical philosophy). |
| | | |
| | | |
| ===Causal emergence in complex networks=== | | ===Causal emergence in complex networks=== |
− | In 2020, Klein and Hoel improved the method of quantifying causal emergence on Markov chains to be applied to complex networks <ref>Klein B, Hoel E. The emergence of informative higher scales in complex networks[J]. Complexity, 2020, 20201-12.</ref>. The authors defined the Markov chain in the network with the help of random walkers, placing random walkers on nodes is equivalent to intervening on nodes, and then defining the transition probability matrix between nodes based on the random walk probability. At the same time, the authors establish a connection between effective information and the connectivity of the network. Connectivity can be characterized by the uncertainty of the weights of the outgoing and incoming edges of nodes. Based on this, the effective information in complex networks is defined. For detailed methods, refer to Causal emergence in complex networks. | + | In 2020, Klein and Hoel improved the method of quantifying causal emergence on Markov chains to be applied to [[complex networks]] <ref>Klein B, Hoel E. The emergence of informative higher scales in complex networks[J]. Complexity, 2020, 20201-12.</ref>. The authors defined the [[Markov chain]] in the network with the help of [[random walkers]], placing random walkers on nodes is equivalent to intervening on nodes, and then defining the transition probability matrix between nodes based on the random walk probability. At the same time, the authors establish a connection between [[effective information]] and the connectivity of the network. Connectivity can be characterized by the uncertainty of the weights of the outgoing and incoming edges of nodes. Based on this, the effective information in complex networks is defined. For detailed methods, refer to [[Causal emergence in complex networks]]. |
| | | |
| | | |
− | The authors conducted experimental comparisons in artificial networks such as random network (ER), preferential attachment network model (PA) and four types of real networks, and found that: for ER networks, the magnitude of effective information only depends on the connection probability <math>p</math>, and as the network size increases, it will converge to the value <math>-\log_2p</math>. At the same time, a key finding shows that there is a phase transition point in the EI value, which approximately appears at the position where the average degree (<math><k></math>) of the network is equal to <math>\log_2N</math>. This also corresponds to the phase transition point where the random network structure does not contain more information as its scale increases with the increase of the connection probability. For preferential attachment model networks, when the power-law exponent <math>\alpha<1.0</math> of the network's degree distribution, the magnitude of effective information will increase as the network size increases; when <math>\alpha>1.0</math>, the conclusion is opposite; <math>\alpha = 1.0</math> corresponds exactly to the scale-free network which is the growing critical boundary. For real networks, the authors found that biological networks have the lowest effective information because they have a lot of noise. However, we can remove this noise through effective coarse-graining, which makes biological networks show more significant causal emergence phenomena than other types of networks; while technical type networks are sparser and non-degenerate, so they have higher average efficiency, more specific node relationships, and the highest effective information, but it is difficult to increase the causal emergence measure through coarse-graining. | + | The authors conducted experimental comparisons in artificial networks such as [[random network]] (ER), [[preferential attachment network model]] (PA) and four types of real networks, and found that: for ER networks, the magnitude of effective information only depends on the connection probability <math>p</math>, and as the network size increases, it will converge to the value <math>-\log_2p</math>. At the same time, a key finding shows that there is a phase transition point in the EI value, which approximately appears at the position where the [[average degree]] (<math><k></math>) of the network is equal to <math>\log_2N</math>. This also corresponds to the phase transition point where the random network structure does not contain more information as its scale increases with the increase of the connection probability. For preferential attachment model networks, when the power-law exponent <math>\alpha<1.0</math> of the network's degree distribution, the magnitude of effective information will increase as the network size increases; when <math>\alpha>1.0</math>, the conclusion is opposite; <math>\alpha = 1.0</math> corresponds exactly to the scale-free network which is the growing critical boundary. For real networks, the authors found that biological networks have the lowest effective information because they have a lot of noise. However, we can remove this noise through effective coarse-graining, which makes biological networks show more significant causal emergence phenomena than other types of networks; while technical type networks are sparser and non-degenerate, so they have higher average efficiency, more specific node relationships, and the highest effective information, but it is difficult to increase the causal emergence measure through coarse-graining. |
| | | |
| | | |
− | In this article, the authors use the greedy algorithm to coarse-grain the network. However, for large-scale networks, this algorithm is very inefficient. Subsequently, Griebenow et al. <ref>Griebenow R, Klein B, Hoel E. Finding the right scale of a network: efficient identification of causal emergence through spectral clustering[J]. arXiv preprint arXiv:190807565, 2019.</ref> proposed a method based on spectral clustering to identify causal emergence in preferential attachment networks. Compared with the greedy algorithm and the gradient descent algorithm, the spectral clustering algorithm has less computation time and the causal emergence of the found macroscopic network is also more significant. | + | In this article, the authors use the [[greedy algorithm]] to coarse-grain the network. However, for large-scale networks, this algorithm is very inefficient. Subsequently, Griebenow et al. <ref>Griebenow R, Klein B, Hoel E. Finding the right scale of a network: efficient identification of causal emergence through spectral clustering[J]. arXiv preprint arXiv:190807565, 2019.</ref> proposed a method based on [[spectral clustering]] to identify causal emergence in [[preferential attachment networks]]. Compared with the [[greedy algorithm]] and the [[gradient descent algorithm]], the [[spectral clustering algorithm]] has less computation time and the causal emergence of the found macroscopic network is also more significant. |
| | | |
| | | |
| ===Application on biological networks=== | | ===Application on biological networks=== |
− | Furthermore, Klein et al. extended the method of causal emergence in complex networks to more biological networks. As mentioned earlier, biological networks have more noise, which makes it difficult for us to understand their internal operating principles. This noise comes from the inherent noise of the system on the one hand, and is introduced by measurement or observation on the other hand. Klein et al. <ref>Klein B, Swain A, Byrum T, et al. Exploring noise, degeneracy and determinism in biological networks with the einet package[J]. Methods in Ecology and Evolution, 2022, 13(4): 799-804.</ref> further explored the relationship and specific meanings among noise, degeneracy and determinism in biological networks, and drew some interesting conclusions. | + | Furthermore, Klein et al. extended the method of [[causal emergence in complex networks]] to more [[biological networks]]. As mentioned earlier, biological networks have more noise, which makes it difficult for us to understand their internal operating principles. This noise comes from the inherent noise of the system on the one hand, and is introduced by measurement or observation on the other hand. Klein et al. <ref>Klein B, Swain A, Byrum T, et al. Exploring noise, degeneracy and determinism in biological networks with the einet package[J]. Methods in Ecology and Evolution, 2022, 13(4): 799-804.</ref> further explored the relationship and specific meanings among noise, [[degeneracy]] and [[determinism]] in biological networks, and drew some interesting conclusions. |
| | | |
| | | |
− | For example, high determinism in gene expression networks can be understood as one gene almost certainly leading to the expression of another gene. At the same time, high degeneracy is also widespread in biological systems during evolution. These two factors jointly lead to the fact that it is currently not clear at what scale biological systems should be analyzed to better understand their functions. Klein et al. <ref>Klein B, Hoel E, Swain A, et al. Evolution and emergence: higher order information structure in protein interactomes across the tree of life[J]. Integrative Biology, 2021, 13(12): 283-294.</ref> analyzed protein interaction networks of more than 1800 species and found that networks at macroscopic scales have less noise and degeneracy. At the same time, compared with nodes that do not participate in macroscopic scales, nodes in macroscopic scale networks are more resilient. Therefore, in order to meet the requirements of evolution, biological networks need to evolve macroscopic scales to increase certainty to enhance network resilience and improve the effectiveness of information transmission. | + | For example, high [[determinism]] in gene expression networks can be understood as one gene almost certainly leading to the expression of another gene. At the same time, high [[degeneracy]] is also widespread in biological systems during evolution. These two factors jointly lead to the fact that it is currently not clear at what scale biological systems should be analyzed to better understand their functions. Klein et al. <ref>Klein B, Hoel E, Swain A, et al. Evolution and emergence: higher order information structure in protein interactomes across the tree of life[J]. Integrative Biology, 2021, 13(12): 283-294.</ref> analyzed [[protein interaction networks]] of more than 1800 species and found that networks at macroscopic scales have less noise and [[degeneracy]]. At the same time, compared with nodes that do not participate in macroscopic scales, nodes in macroscopic scale networks are more resilient. Therefore, in order to meet the requirements of evolution, biological networks need to evolve macroscopic scales to increase certainty to enhance [[network resilience]] and improve the effectiveness of information transmission. |
| | | |
| | | |
− | Hoel et al. in the article <ref>Hoel E, Levin M. Emergence of informative higher scales in biological systems: a computational toolkit for optimal prediction and control[J]. Communicative & Integrative Biology, 2020, 13(1): 108-118.</ref> further studied causal emergence in biological systems with the help of effective information theory. The author applied effective information to gene regulatory networks to identify the most informative heart development model to control the heart development of mammals. By quantifying the causal emergence in the largest connected component of the Saccharomyces cerevisiae gene network, the article reveals that informative macroscopic scales are ubiquitous in biology, and that life mechanisms themselves often operate on macroscopic scales. This article also provides biologists with a computable tool to identify the most informative macroscopic scale, and can model, predict, control and understand complex biological systems on this basis. | + | Hoel et al. in the article <ref>Hoel E, Levin M. Emergence of informative higher scales in biological systems: a computational toolkit for optimal prediction and control[J]. Communicative & Integrative Biology, 2020, 13(1): 108-118.</ref> further studied causal emergence in biological systems with the help of [[effective information]] theory. The author applied effective information to [[gene regulatory networks]] to identify the most informative heart development model to control the heart development of mammals. By quantifying the causal emergence in the [[largest connected component]] of the Saccharomyces cerevisiae gene network, the article reveals that informative macroscopic scales are ubiquitous in biology, and that life mechanisms themselves often operate on macroscopic scales. This article also provides biologists with a computable tool to identify the most informative macroscopic scale, and can model, predict, control and understand complex biological systems on this basis. |
| | | |
| | | |
第594行: |
第594行: |
| | | |
| ===Application on artificial neural networks=== | | ===Application on artificial neural networks=== |
− | Marrow et al. in the article <ref>Marrow S, Michaud E J, Hoel E. Examining the Causal Structures of Deep Neural Networks Using Information Theory[J]. Entropy, 2020, 22(12): 1429.</ref> tried to introduce effective information into neural networks to quantify and track the changes in the causal structure of neural networks during the training process. Here, effective information is used to evaluate the degree of causal influence of nodes and edges on downstream targets of each layer. The effective information EI of each layer of neural network is defined as: | + | Marrow et al. in the article <ref>Marrow S, Michaud E J, Hoel E. Examining the Causal Structures of Deep Neural Networks Using Information Theory[J]. Entropy, 2020, 22(12): 1429.</ref> tried to introduce [[effective information]] into [[neural networks]] to quantify and track the changes in the [[causal structure]] of neural networks during the training process. Here, [[effective information]] is used to evaluate the degree of causal influence of nodes and edges on downstream targets of each layer. The effective information EI of each layer of neural network is defined as: |
| | | |
| | | |
第602行: |
第602行: |
| | | |
| | | |
− | Here, <math>L_1</math> and <math>L_2</math> respectively represent the input and output layers connecting the neural network. Here, the input layer is do-intervened as a uniform distribution as a whole, and then the mutual information between cause and effect is calculated. Effective information can be decomposed into sensitivity and degeneracy. Here, sensitivity is defined as: | + | Here, <math>L_1</math> and <math>L_2</math> respectively represent the input and output layers connecting the neural network. Here, the input layer is do-intervened as a uniform distribution as a whole, and then the mutual information between cause and effect is calculated. [[Effective information]] can be decomposed into sensitivity and degeneracy. Here, sensitivity is defined as: |
| | | |
| | | |
第621行: |
第621行: |
| | | |
| | | |
− | By observing the effective information during the model training process, including the changes of sensitivity and degeneracy, we can know the generalization ability of the model, thereby helping scholars better understand and explain the working principle of neural networks. | + | By observing the effective information during the model training process, including the changes of [[sensitivity]] and [[degeneracy]], we can know the generalization ability of the model, thereby helping scholars better understand and explain the working principle of neural networks. |
| | | |
| | | |
| ===Application on the brain nervous system=== | | ===Application on the brain nervous system=== |
− | The brain nervous system is an emergent multi-scale complex system. Luppi et al. <ref>Luppi AI, Mediano PA, Rosas FE, Allanson J, Pickard JD, Carhart-Harris RL, Williams GB, Craig MM, Finoia P, Owen AM, Naci L. A synergistic workspace for human consciousness revealed by integrated information decomposition. BioRxiv. 2020 Nov 26:2020-11.</ref> Based on integrated information decomposition, the synergistic workspace of human consciousness is revealed. The authors constructed a three-layer architecture of brain cognition, including: external environment, specific modules and synergistic global space. The working principle of the brain mainly includes three stages: the first stage is responsible for collecting information from multiple different modules into the workspace; the second stage is responsible for integrating the collected information in the workspace; the third stage is responsible for broadcasting global information to other parts of the brain. The authors conducted experiments on three types of fMRI data in different resting states, including 100 normal people, 15 subjects participating in anesthesia experiments (including three different states before anesthesia, during anesthesia and recovery), and 22 subjects with chronic disorders of consciousness (DOC). This article uses integrated information decomposition to obtain synergistic information and redundant information, and uses the revised integrated information value <math>\Phi_R</math> to calculate the synergy and redundancy values between each two brain regions, so as to obtain whether the factor that each brain region plays a greater role is synergy or redundancy. At the same time, by comparing the data of conscious people, they found that the regions where the integrated information of unconscious people was significantly reduced all belonged to the brain regions where synergistic information played a greater role. At the same time, they found that the regions where the integrated information was significantly reduced all belonged to functional regions such as DMN (Default Mode Network), thus locating the brain regions that have a significant effect on the occurrence of consciousness. | + | The brain nervous system is an emergent multi-scale [[complex system]]. Luppi et al. <ref>Luppi AI, Mediano PA, Rosas FE, Allanson J, Pickard JD, Carhart-Harris RL, Williams GB, Craig MM, Finoia P, Owen AM, Naci L. A synergistic workspace for human consciousness revealed by integrated information decomposition. BioRxiv. 2020 Nov 26:2020-11.</ref> Based on [[integrated information decomposition]], the synergistic workspace of human [[consciousness]] is revealed. The authors constructed a three-layer architecture of brain cognition, including: external environment, specific modules and synergistic global space. The working principle of the brain mainly includes three stages: the first stage is responsible for collecting information from multiple different modules into the workspace; the second stage is responsible for integrating the collected information in the workspace; the third stage is responsible for broadcasting global information to other parts of the brain. The authors conducted experiments on three types of fMRI data in different resting states, including 100 normal people, 15 subjects participating in anesthesia experiments (including three different states before anesthesia, during anesthesia and recovery), and 22 subjects with chronic disorders of consciousness (DOC). This article uses [[integrated information decomposition]] to obtain [[synergistic information]] and [[redundant information]], and uses the revised [[integrated information value]] <math>\Phi_R</math> to calculate the synergy and redundancy values between each two brain regions, so as to obtain whether the factor that each brain region plays a greater role is synergy or redundancy. At the same time, by comparing the data of conscious people, they found that the regions where the [[integrated information]] of unconscious people was significantly reduced all belonged to the brain regions where [[synergistic information]] played a greater role. At the same time, they found that the regions where the integrated information was significantly reduced all belonged to functional regions such as DMN (Default Mode Network), thus locating the brain regions that have a significant effect on the occurrence of consciousness. |
| | | |
| | | |
| ===Application in artificial intelligence systems=== | | ===Application in artificial intelligence systems=== |
− | The causal emergence theory also has a very strong connection with the field of artificial intelligence. This is manifested in the following ways. First, the machine learning solution to the causal emergence identification problem is actually an application of causal representation learning. Second, technologies such as maximizing effective information are also expected to be applied to fields such as causal machine learning. | + | The causal emergence theory also has a very strong connection with the field of [[artificial intelligence]]. This is manifested in the following ways. First, the machine learning solution to the causal emergence identification problem is actually an application of [[causal representation learning]]. Second, technologies such as maximizing [[effective information]] are also expected to be applied to fields such as [[causal machine learning]]. |
| | | |
| | | |
| ====Causal representation learning==== | | ====Causal representation learning==== |
− | Causal representation learning is an emerging field in artificial intelligence. It attempts to combine two important fields in machine learning: representation learning and causal inference, and tries to combine their respective advantages to automatically extract important features and causal relationships behind the data [55]. Causal emergence identification based on effective information can be equivalent to a causal representation learning task. Identifying the emergence of causal relationships from data is equivalent to learning the underlying potential causal relationships and causal mechanisms of the data. Specifically, we can regard the macroscopic state as a causal variable, the macroscopic dynamics as a causal mechanism by analogy, the coarse-graining strategy can be regarded as an encoding process or representation from the original data to the causal variable, and the effective information can be understood as a measure of the causal effect strength of the mechanism. | + | [[Causal representation learning]] is an emerging field in artificial intelligence. It attempts to combine two important fields in machine learning: [[representation learning]] and [[causal inference]], and tries to combine their respective advantages to automatically extract important features and causal relationships behind the data [55]. Causal emergence identification based on effective information can be equivalent to a causal representation learning task. Identifying the emergence of causal relationships from data is equivalent to learning the underlying potential causal relationships and causal mechanisms of the data. Specifically, we can regard the macroscopic state as a causal variable, the macroscopic dynamics as a causal mechanism by analogy, the coarse-graining strategy can be regarded as an encoding process or representation from the original data to the causal variable, and the effective information can be understood as a measure of the causal effect strength of the mechanism. |
| | | |
| | | |
− | Since there are many similarities between the two, the techniques and concepts of the two fields can learn from each other. For example, causal representation learning technology can be applied to causal emergence identification. In turn, the learned abstract causal representation can be interpreted as a macroscopic state, thereby enhancing the interpretability of causal representation learning. However, there are also significant differences between the two, mainly including two points: 1) Causal representation learning assumes that there is a real causal mechanism behind it, and the data is generated by this causal mechanism. However, there may not be a "true causal relationship" between the states and dynamics emerging at the macroscopic level; 2) The macroscopic state after coarse-graining in causal emergence is a low-dimensional description, but there is no such requirement in causal representation learning. From an epistemological perspective, there is no difference between the two, because both are extracting effective information from observational data to obtain representations with stronger causal effects. | + | Since there are many similarities between the two, the techniques and concepts of the two fields can learn from each other. For example, [[causal representation learning]] technology can be applied to [[causal emergence identification]]. In turn, the learned abstract causal representation can be interpreted as a macroscopic state, thereby enhancing the interpretability of causal representation learning. However, there are also significant differences between the two, mainly including two points: 1) Causal representation learning assumes that there is a real [[causal mechanism]] behind it, and the data is generated by this causal mechanism. However, there may not be a "true causal relationship" between the states and dynamics emerging at the macroscopic level; 2) The macroscopic state after coarse-graining in causal emergence is a low-dimensional description, but there is no such requirement in causal representation learning. From an epistemological perspective, there is no difference between the two, because both are extracting effective information from observational data to obtain representations with stronger causal effects. |
| | | |
| | | |
第662行: |
第662行: |
| | | |
| ====Application of effective information in causal machine learning==== | | ====Application of effective information in causal machine learning==== |
− | Causal emergence can enhance the performance of machine learning in out-of-distribution scenarios. The do-intervention introduced in <math>EI</math> captures the causal dependence in the data generation process and suppresses spurious correlations, thus supplementing machine learning algorithms based on associations and establishing a connection between <math>EI</math> and out-of-distribution generalization (Out Of Distribution, abbreviated as OOD) <ref name="Emergence_and_causality_in_complex_systems">{{cite journal|author1=Yuan, B|author2=Zhang, J|author3=Lyu, A|author4=Wu, J|author5=Wang, Z|author6=Yang, M|author7=Liu, K|author8=Mou, M|author9=Cui, P|title=Emergence and causality in complex systems: A survey of causal emergence and related quantitative studies|journal=Entropy|year=2024|volume=26|issue=2|page=108|url=https://www.mdpi.com/1099-4300/26/2/108}}</ref>. Due to the universality of effective information, causal emergence can be applied to supervised machine learning to evaluate the strength of the causal relationship between the feature space <math>X</math> and the target space <math>Y</math>, thereby improving the prediction accuracy from cause (feature) to result (target). It is worth noting that direct fitting of observations from <math>X</math> to <math>Y</math> is sufficient for common prediction tasks with the i.i.d. assumption, which means that the training data and test data are independently and identically distributed. However, if samples are drawn from outside the training distribution, a generalization representation space from training to test environments must be learned. Since it is generally believed that the generalization of causality is better than statistical correlation <ref>Arjovsky, M.; Bottou, L.; Gulrajani, I.; Lopez-Paz, D. Invariant risk minimization. arXiv 2019, arXiv:1907.02893.</ref>, therefore, the causal emergence theory can serve as a standard for embedding causal relationships in the representation space. The occurrence of causal emergence reveals the potential causal factors of the target, thereby producing a robust representation space for out-of-distribution generalization. Causal emergence may provide a unified representation measure for out-of-distribution generalization based on causal theory. <math>EI</math> can also be regarded as an information-theoretic abstraction of the out-of-distribution generalization's reweighting-based debiasing technique. In addition, we conjecture that out-of-distribution generalization can be achieved while maximizing <math>EI</math>, and <math>EI</math> may reach its peak at the intermediate stage of the original feature abstraction, which is consistent with the idea of OOD generalization, that is, less is more. Ideally, when causal emergence occurs at the peak of <math>EI</math>, all non-causal features are excluded and causal features are revealed, resulting in the most informative representation. | + | Causal emergence can enhance the performance of [[machine learning]] in out-of-distribution scenarios. The do-intervention introduced in <math>EI</math> captures the causal dependence in the data generation process and suppresses spurious correlations, thus supplementing machine learning algorithms based on associations and establishing a connection between <math>EI</math> and [[out-of-distribution generalization]] (Out Of Distribution, abbreviated as OOD) <ref name="Emergence_and_causality_in_complex_systems">{{cite journal|author1=Yuan, B|author2=Zhang, J|author3=Lyu, A|author4=Wu, J|author5=Wang, Z|author6=Yang, M|author7=Liu, K|author8=Mou, M|author9=Cui, P|title=Emergence and causality in complex systems: A survey of causal emergence and related quantitative studies|journal=Entropy|year=2024|volume=26|issue=2|page=108|url=https://www.mdpi.com/1099-4300/26/2/108}}</ref>. Due to the universality of [[effective information]], causal emergence can be applied to supervised machine learning to evaluate the strength of the causal relationship between the feature space <math>X</math> and the target space <math>Y</math>, thereby improving the prediction accuracy from cause (feature) to result (target). It is worth noting that direct fitting of observations from <math>X</math> to <math>Y</math> is sufficient for common prediction tasks with the i.i.d. assumption, which means that the training data and test data are [[independently and identically distributed]]. However, if samples are drawn from outside the training distribution, a generalization representation space from training to test environments must be learned. Since it is generally believed that the generalization of causality is better than [[statistical correlation]] <ref>Arjovsky, M.; Bottou, L.; Gulrajani, I.; Lopez-Paz, D. Invariant risk minimization. arXiv 2019, arXiv:1907.02893.</ref>, therefore, the causal emergence theory can serve as a standard for embedding causal relationships in the representation space. The occurrence of causal emergence reveals the potential causal factors of the target, thereby producing a robust representation space for out-of-distribution generalization. Causal emergence may provide a unified representation measure for out-of-distribution generalization based on causal theory. <math>EI</math> can also be regarded as an information-theoretic abstraction of the out-of-distribution generalization's reweighting-based debiasing technique. In addition, we conjecture that out-of-distribution generalization can be achieved while maximizing <math>EI</math>, and <math>EI</math> may reach its peak at the intermediate stage of the original feature abstraction, which is consistent with the idea of OOD generalization, that is, less is more. Ideally, when causal emergence occurs at the peak of <math>EI</math>, all non-causal features are excluded and causal features are revealed, resulting in the most informative representation. |
| | | |
| | | |
| =====Causal model abstraction===== | | =====Causal model abstraction===== |
− | In complex systems, since microscopic states often have noise, people need to coarse-grain microscopic states to obtain macroscopic states with less noise, so that the causality of macroscopic dynamics is stronger. The same is true for causal models that explain various types of data. Due to the excessive complexity of the original model or limited computing resources, people often need to obtain a more abstract causal model and ensure that the abstract model maintains the causal mechanism of the original model as much as possible. This is the so-called causal model abstraction. | + | In complex systems, since microscopic states often have noise, people need to coarse-grain microscopic states to obtain macroscopic states with less noise, so that the causality of macroscopic dynamics is stronger. The same is true for causal models that explain various types of data. Due to the excessive complexity of the original model or limited computing resources, people often need to obtain a more abstract causal model and ensure that the abstract model maintains the [[causal mechanism]] of the original model as much as possible. This is the so-called [[causal model abstraction]]. |
| | | |
| | | |
− | Causal model abstraction belongs to a subfield of artificial intelligence and plays an important role especially in causal inference and model interpretability. This abstraction can help us better understand the hidden causal mechanisms in the data and the interactions between variables. Causal model abstraction is achieved by evaluating the optimization of a high-level model to simulate the causal effects of a low-level model as much as possible <ref>Beckers, Sander, and Joseph Y. Halpern. "Abstracting causal models." Proceedings of the aaai conference on artificial intelligence. Vol. 33. No. 01. 2019.</ref>. If a high-level model can generalize the causal effects of a low-level model, we call this high-level model a causal abstraction of the low-level model. | + | [[Causal model abstraction]] belongs to a subfield of artificial intelligence and plays an important role especially in causal inference and model interpretability. This abstraction can help us better understand the hidden causal mechanisms in the data and the interactions between variables. Causal model abstraction is achieved by evaluating the optimization of a high-level model to simulate the causal effects of a low-level model as much as possible <ref>Beckers, Sander, and Joseph Y. Halpern. "Abstracting causal models." Proceedings of the aaai conference on artificial intelligence. Vol. 33. No. 01. 2019.</ref>. If a high-level model can generalize the causal effects of a low-level model, we call this high-level model a causal abstraction of the low-level model. |
| | | |
| | | |
− | Causal model abstraction also discusses the interaction between causal relationships and model abstraction (which can be regarded as a coarse-graining process) <ref>S. Beckers, F. Eberhardt, J. Y. Halpern, Approximate causal abstractions, in: Uncertainty in artificial intelligence, PMLR, 2020, pp. 606–615.</ref>. Therefore, causal emergence identification and causal model abstraction have many similarities. The original causal mechanism can be understood as microscopic dynamics, and the abstracted mechanism can be understood as macroscopic dynamics. In the neural information compression framework (NIS), researchers place restrictions on coarse-graining strategies and macroscopic dynamics, requiring that the microscopic prediction error of macroscopic dynamics be small enough to exclude trivial solutions. This requirement is also similar to causal model abstraction, which hopes that the abstracted causal model is as similar as possible to the original model. However, there are also some differences between the two: 1) Causal emergence identification is to coarse-grain states or data, while causal model abstraction is to perform coarse-graining operations on models; 2) Causal model abstraction considers confounding factors, but this point is ignored in the discussion of causal emergence identification. | + | Causal model abstraction also discusses the interaction between causal relationships and model abstraction (which can be regarded as a coarse-graining process) <ref>S. Beckers, F. Eberhardt, J. Y. Halpern, Approximate causal abstractions, in: Uncertainty in artificial intelligence, PMLR, 2020, pp. 606–615.</ref>. Therefore, causal emergence identification and causal model abstraction have many similarities. The original causal mechanism can be understood as microscopic dynamics, and the abstracted mechanism can be understood as macroscopic dynamics. In the [[neural information compression framework]] (NIS), researchers place restrictions on coarse-graining strategies and macroscopic dynamics, requiring that the microscopic prediction error of macroscopic dynamics be small enough to exclude trivial solutions. This requirement is also similar to causal model abstraction, which hopes that the abstracted causal model is as similar as possible to the original model. However, there are also some differences between the two: 1) Causal emergence identification is to coarse-grain states or data, while causal model abstraction is to perform coarse-graining operations on models; 2) Causal model abstraction considers confounding factors, but this point is ignored in the discussion of causal emergence identification. |
| | | |
| | | |
| =====Reinforcement learning based on world models===== | | =====Reinforcement learning based on world models===== |
− | Reinforcement learning based on world models assumes that there is a world model inside the reinforcement learning agent, so that it can simulate the dynamics of the environment faced by the intelligent agent <ref>D. Ha, J. Schmidhuber, World models, arXiv preprint arXiv:1803.10122 (2018).</ref>. The dynamics of the world model can be learned through the interaction between the agent and the environment, thereby helping the agent to plan and make decisions in an uncertain environment. At the same time, in order to represent a complex environment, the world model must be a coarse-grained description of the environment. A typical world model architecture always contains an encoder and a decoder. | + | [[Reinforcement learning]] based on [[world models]] assumes that there is a world model inside the reinforcement learning agent, so that it can simulate the dynamics of the environment faced by the [[intelligent agent]] <ref>D. Ha, J. Schmidhuber, World models, arXiv preprint arXiv:1803.10122 (2018).</ref>. The dynamics of the world model can be learned through the interaction between the agent and the environment, thereby helping the [[agent]] to plan and make decisions in an uncertain environment. At the same time, in order to represent a complex environment, the world model must be a coarse-grained description of the environment. A typical world model architecture always contains an encoder and a decoder. |
| | | |
| | | |
第690行: |
第690行: |
| | | |
| ====Consciousness research==== | | ====Consciousness research==== |
− | First of all, the proposal of the causal emergence theory is greatly related to the research of consciousness science. This is because the core indicator of the causal emergence theory, effective information, was first proposed by Tononi in the quantitative theory of consciousness research, integrated information theory. After being modified, it was applied to Markov chains by Erik Hoel and the concept of causal emergence was proposed. Therefore, in this sense, effective information is actually a by-product of quantitative consciousness science. | + | First of all, the proposal of the causal emergence theory is greatly related to the research of consciousness science. This is because the core indicator of the causal emergence theory, [[effective information]], was first proposed by [[Tononi]] in the quantitative theory of consciousness research, [[integrated information theory]]. After being modified, it was applied to Markov chains by [[Erik Hoel]] and the concept of causal emergence was proposed. Therefore, in this sense, effective information is actually a by-product of quantitative consciousness science. |
| | | |
| | | |
第696行: |
第696行: |
| | | |
| | | |
− | Thirdly, causal emergence may answer the question of free will. Do people have free will? Is the decision we make really a free choice of our will? Or is it possible that it is just an illusion? In fact, if we accept the concept of causal emergence and admit that macroscopic variables will have causal force on microscopic variables, then all our decisions are actually made spontaneously by the brain system, and consciousness is only a certain level of explanation of this complex decision-making process. Therefore, free will is an emergent downward causation. The answers to these questions await further research of the causal emergence theory. | + | Thirdly, causal emergence may answer the question of free will. Do people have free will? Is the decision we make really a free choice of our will? Or is it possible that it is just an illusion? In fact, if we accept the concept of causal emergence and admit that macroscopic variables will have [[causal force]] on microscopic variables, then all our decisions are actually made spontaneously by the brain system, and consciousness is only a certain level of explanation of this complex decision-making process. Therefore, free will is an emergent [[downward causation]]. The answers to these questions await further research of the causal emergence theory. |
| | | |
| | | |
| ====Chinese classical philosophy==== | | ====Chinese classical philosophy==== |
− | Different from Western science and philosophy, Chinese classical philosophy retains a complete and different theoretical framework for explaining the universe, including yin and yang, five elements, eight trigrams, as well as divination, feng shui, traditional Chinese medicine, etc., and can give completely independent explanations for various phenomena in the universe. For a long time, the two sets of philosophies in the East and the West have always been difficult to integrate. The idea of causal emergence may provide a new explanation to bridge the conflict between Eastern and Western philosophies. | + | Different from Western science and philosophy, [[Chinese classical philosophy]] retains a complete and different theoretical framework for explaining the universe, including [[yin and yang]], [[five elements]], [[eight trigrams]], as well as [[divination]], [[feng shui]], [[traditional Chinese medicine]], etc., and can give completely independent explanations for various phenomena in the universe. For a long time, the two sets of philosophies in the East and the West have always been difficult to integrate. The idea of causal emergence may provide a new explanation to bridge the conflict between Eastern and Western philosophies. |
| | | |
| | | |
− | According to the causal emergence theory, the quality of a theory depends on the strength of causality, that is, the size of <math>EI</math>. And different coarse-graining schemes will obtain completely different macroscopic theories (macroscopic dynamics). It is very likely that when facing the same research object of complex systems, the Western philosophical and scientific system gives a set of relatively specific and microscopic causal mechanisms (dynamics), while Eastern philosophy gives a set of more coarsely grained macroscopic causal mechanisms. According to the causal emergence theory or the Causal Equivalence Principle proposed by Yurchenko, the two are completely likely to be compatible with each other. That is to say, for the same set of phenomena, the East and the West can make correct predictions and even intervention methods according to two different sets of causal mechanisms. Of course, it is also possible that in certain types of problems or phenomena, a more macroscopic causal mechanism is more explanatory or leads to a good solution. For some problems or phenomena, a more microscopic causal mechanism is more favorable. | + | According to the causal emergence theory, the quality of a theory depends on the strength of causality, that is, the size of <math>EI</math>. And different coarse-graining schemes will obtain completely different macroscopic theories (macroscopic dynamics). It is very likely that when facing the same research object of complex systems, the Western philosophical and scientific system gives a set of relatively specific and microscopic causal mechanisms (dynamics), while Eastern philosophy gives a set of more coarsely grained macroscopic causal mechanisms. According to the causal emergence theory or the [[Causal Equivalence Principle]] proposed by Yurchenko, the two are completely likely to be compatible with each other. That is to say, for the same set of phenomena, the East and the West can make correct predictions and even intervention methods according to two different sets of causal mechanisms. Of course, it is also possible that in certain types of problems or phenomena, a more macroscopic causal mechanism is more explanatory or leads to a good solution. For some problems or phenomena, a more microscopic causal mechanism is more favorable. |
| | | |
| | | |
− | For example, taking the concept of five elements in Eastern philosophy, we can completely understand the five elements as five macroscopic states of everything, and the relationship of mutual generation and restraint of the five elements can be understood as a macroscopic causal mechanism between these five macroscopic states. Then, the cognitive process of extracting these five states of the five elements from everything is a coarse-graining process, which depends on the observer's ability to analogize. Therefore, the theory of five elements can be regarded as an abstract causal emergence theory for everything. Similarly, we can also apply the concept of causal emergence to more fields, including traditional Chinese medicine, divination, feng shui, etc. The common point of these applications will be that its causal mechanism is simpler and possibly has stronger causality compared to Western science, but the process of obtaining such an abstract coarse-graining is more complex and more dependent on experienced abstractors. This explains why Eastern philosophies all emphasize the self-cultivation of practitioners. This is because these Eastern philosophical theories put huge complexity and computational burden on '''analogical thinking'''. | + | For example, taking the concept of [[five elements]] in Eastern philosophy, we can completely understand the [[five elements]] as five macroscopic states of everything, and the relationship of mutual generation and restraint of the [[five elements]] can be understood as a macroscopic causal mechanism between these five macroscopic states. Then, the cognitive process of extracting these five states of the [[five elements]] from everything is a coarse-graining process, which depends on the observer's ability to analogize. Therefore, the theory of five elements can be regarded as an abstract causal emergence theory for everything. Similarly, we can also apply the concept of causal emergence to more fields, including traditional Chinese medicine, divination, feng shui, etc. The common point of these applications will be that its causal mechanism is simpler and possibly has stronger causality compared to Western science, but the process of obtaining such an abstract coarse-graining is more complex and more dependent on experienced abstractors. This explains why Eastern philosophies all emphasize the self-cultivation of practitioners. This is because these Eastern philosophical theories put huge complexity and computational burden on '''analogical thinking'''. |
− | | |
| | | |
| ==Critique== | | ==Critique== |