第578行: |
第578行: |
| | | |
| ===Causal emergence in complex networks=== | | ===Causal emergence in complex networks=== |
− | In 2020, Klein and Hoel improved the method of quantifying causal emergence on Markov chains to be applied to complex networks [47]. The authors defined the Markov chain in the network with the help of random walkers, placing random walkers on nodes is equivalent to intervening on nodes, and then defining the transition probability matrix between nodes based on the random walk probability. At the same time, the authors establish a connection between effective information and the connectivity of the network. Connectivity can be characterized by the uncertainty of the weights of the outgoing and incoming edges of nodes. Based on this, the effective information in complex networks is defined. For detailed methods, refer to Causal emergence in complex networks. | + | In 2020, Klein and Hoel improved the method of quantifying causal emergence on Markov chains to be applied to complex networks <ref>Klein B, Hoel E. The emergence of informative higher scales in complex networks[J]. Complexity, 2020, 20201-12.</ref>. The authors defined the Markov chain in the network with the help of random walkers, placing random walkers on nodes is equivalent to intervening on nodes, and then defining the transition probability matrix between nodes based on the random walk probability. At the same time, the authors establish a connection between effective information and the connectivity of the network. Connectivity can be characterized by the uncertainty of the weights of the outgoing and incoming edges of nodes. Based on this, the effective information in complex networks is defined. For detailed methods, refer to Causal emergence in complex networks. |
| | | |
| | | |
第584行: |
第584行: |
| | | |
| | | |
− | In this article, the authors use the greedy algorithm to coarse-grain the network. However, for large-scale networks, this algorithm is very inefficient. Subsequently, Griebenow et al. [48] proposed a method based on spectral clustering to identify causal emergence in preferential attachment networks. Compared with the greedy algorithm and the gradient descent algorithm, the spectral clustering algorithm has less computation time and the causal emergence of the found macroscopic network is also more significant. | + | In this article, the authors use the greedy algorithm to coarse-grain the network. However, for large-scale networks, this algorithm is very inefficient. Subsequently, Griebenow et al. <ref>Griebenow R, Klein B, Hoel E. Finding the right scale of a network: efficient identification of causal emergence through spectral clustering[J]. arXiv preprint arXiv:190807565, 2019.</ref> proposed a method based on spectral clustering to identify causal emergence in preferential attachment networks. Compared with the greedy algorithm and the gradient descent algorithm, the spectral clustering algorithm has less computation time and the causal emergence of the found macroscopic network is also more significant. |
| | | |
| | | |
| ===Application on biological networks=== | | ===Application on biological networks=== |
− | Furthermore, Klein et al. extended the method of causal emergence in complex networks to more biological networks. As mentioned earlier, biological networks have more noise, which makes it difficult for us to understand their internal operating principles. This noise comes from the inherent noise of the system on the one hand, and is introduced by measurement or observation on the other hand. Klein et al. [49] further explored the relationship and specific meanings among noise, degeneracy and determinism in biological networks, and drew some interesting conclusions. | + | Furthermore, Klein et al. extended the method of causal emergence in complex networks to more biological networks. As mentioned earlier, biological networks have more noise, which makes it difficult for us to understand their internal operating principles. This noise comes from the inherent noise of the system on the one hand, and is introduced by measurement or observation on the other hand. Klein et al. <ref>Klein B, Swain A, Byrum T, et al. Exploring noise, degeneracy and determinism in biological networks with the einet package[J]. Methods in Ecology and Evolution, 2022, 13(4): 799-804.</ref> further explored the relationship and specific meanings among noise, degeneracy and determinism in biological networks, and drew some interesting conclusions. |
| | | |
| | | |
− | For example, high determinism in gene expression networks can be understood as one gene almost certainly leading to the expression of another gene. At the same time, high degeneracy is also widespread in biological systems during evolution. These two factors jointly lead to the fact that it is currently not clear at what scale biological systems should be analyzed to better understand their functions. Klein et al. [50] analyzed protein interaction networks of more than 1800 species and found that networks at macroscopic scales have less noise and degeneracy. At the same time, compared with nodes that do not participate in macroscopic scales, nodes in macroscopic scale networks are more resilient. Therefore, in order to meet the requirements of evolution, biological networks need to evolve macroscopic scales to increase certainty to enhance network resilience and improve the effectiveness of information transmission. | + | For example, high determinism in gene expression networks can be understood as one gene almost certainly leading to the expression of another gene. At the same time, high degeneracy is also widespread in biological systems during evolution. These two factors jointly lead to the fact that it is currently not clear at what scale biological systems should be analyzed to better understand their functions. Klein et al. <ref>Klein B, Hoel E, Swain A, et al. Evolution and emergence: higher order information structure in protein interactomes across the tree of life[J]. Integrative Biology, 2021, 13(12): 283-294.</ref> analyzed protein interaction networks of more than 1800 species and found that networks at macroscopic scales have less noise and degeneracy. At the same time, compared with nodes that do not participate in macroscopic scales, nodes in macroscopic scale networks are more resilient. Therefore, in order to meet the requirements of evolution, biological networks need to evolve macroscopic scales to increase certainty to enhance network resilience and improve the effectiveness of information transmission. |
| | | |
| | | |
− | Hoel et al. in the article [51] further studied causal emergence in biological systems with the help of effective information theory. The author applied effective information to gene regulatory networks to identify the most informative heart development model to control the heart development of mammals. By quantifying the causal emergence in the largest connected component of the Saccharomyces cerevisiae gene network, the article reveals that informative macroscopic scales are ubiquitous in biology, and that life mechanisms themselves often operate on macroscopic scales. This article also provides biologists with a computable tool to identify the most informative macroscopic scale, and can model, predict, control and understand complex biological systems on this basis. | + | Hoel et al. in the article <ref>Hoel E, Levin M. Emergence of informative higher scales in biological systems: a computational toolkit for optimal prediction and control[J]. Communicative & Integrative Biology, 2020, 13(1): 108-118.</ref> further studied causal emergence in biological systems with the help of effective information theory. The author applied effective information to gene regulatory networks to identify the most informative heart development model to control the heart development of mammals. By quantifying the causal emergence in the largest connected component of the Saccharomyces cerevisiae gene network, the article reveals that informative macroscopic scales are ubiquitous in biology, and that life mechanisms themselves often operate on macroscopic scales. This article also provides biologists with a computable tool to identify the most informative macroscopic scale, and can model, predict, control and understand complex biological systems on this basis. |
| | | |
| | | |
− | Swain et al. in the article [52] explored the influence of the interaction history of ant colonies on task allocation and task switching, and used effective information to study how noise spreads among ants. The results found that the degree of historical interaction between ant colonies affects task allocation, and the type of ant colony in specific interactions determines the noise in the interaction. In addition, even when ants switch functional groups, the emergent cohesion of ant colonies can ensure the stability of the colony. At the same time, different functional ant colonies also play different roles in maintaining the cohesion of the colony. | + | Swain et al. in the article <ref>Swain A, Williams S D, Di Felice L J, et al. Interactions and information: exploring task allocation in ant colonies using network analysis[J]. Animal Behaviour, 2022, 18969-81.</ref> explored the influence of the interaction history of ant colonies on task allocation and task switching, and used effective information to study how noise spreads among ants. The results found that the degree of historical interaction between ant colonies affects task allocation, and the type of ant colony in specific interactions determines the noise in the interaction. In addition, even when ants switch functional groups, the emergent cohesion of ant colonies can ensure the stability of the colony. At the same time, different functional ant colonies also play different roles in maintaining the cohesion of the colony. |
| | | |
| | | |
| ===Application on artificial neural networks=== | | ===Application on artificial neural networks=== |
− | Marrow et al. in the article [53] tried to introduce effective information into neural networks to quantify and track the changes in the causal structure of neural networks during the training process. Here, effective information is used to evaluate the degree of causal influence of nodes and edges on downstream targets of each layer. The effective information EI of each layer of neural network is defined as: | + | Marrow et al. in the article <ref>Marrow S, Michaud E J, Hoel E. Examining the Causal Structures of Deep Neural Networks Using Information Theory[J]. Entropy, 2020, 22(12): 1429.</ref> tried to introduce effective information into neural networks to quantify and track the changes in the causal structure of neural networks during the training process. Here, effective information is used to evaluate the degree of causal influence of nodes and edges on downstream targets of each layer. The effective information EI of each layer of neural network is defined as: |
| | | |
| | | |
第632行: |
第632行: |
| | | |
| ===Application on the brain nervous system=== | | ===Application on the brain nervous system=== |
− | The brain nervous system is an emergent multi-scale complex system. Luppi et al. [54] Based on integrated information decomposition, the synergistic workspace of human consciousness is revealed. The authors constructed a three-layer architecture of brain cognition, including: external environment, specific modules and synergistic global space. The working principle of the brain mainly includes three stages: the first stage is responsible for collecting information from multiple different modules into the workspace; the second stage is responsible for integrating the collected information in the workspace; the third stage is responsible for broadcasting global information to other parts of the brain. The authors conducted experiments on three types of fMRI data in different resting states, including 100 normal people, 15 subjects participating in anesthesia experiments (including three different states before anesthesia, during anesthesia and recovery), and 22 subjects with chronic disorders of consciousness (DOC). This article uses integrated information decomposition to obtain synergistic information and redundant information, and uses the revised integrated information value <math>\Phi_R</math> to calculate the synergy and redundancy values between each two brain regions, so as to obtain whether the factor that each brain region plays a greater role is synergy or redundancy. At the same time, by comparing the data of conscious people, they found that the regions where the integrated information of unconscious people was significantly reduced all belonged to the brain regions where synergistic information played a greater role. At the same time, they found that the regions where the integrated information was significantly reduced all belonged to functional regions such as DMN (Default Mode Network), thus locating the brain regions that have a significant effect on the occurrence of consciousness. | + | The brain nervous system is an emergent multi-scale complex system. Luppi et al. <ref>Luppi AI, Mediano PA, Rosas FE, Allanson J, Pickard JD, Carhart-Harris RL, Williams GB, Craig MM, Finoia P, Owen AM, Naci L. A synergistic workspace for human consciousness revealed by integrated information decomposition. BioRxiv. 2020 Nov 26:2020-11.</ref> Based on integrated information decomposition, the synergistic workspace of human consciousness is revealed. The authors constructed a three-layer architecture of brain cognition, including: external environment, specific modules and synergistic global space. The working principle of the brain mainly includes three stages: the first stage is responsible for collecting information from multiple different modules into the workspace; the second stage is responsible for integrating the collected information in the workspace; the third stage is responsible for broadcasting global information to other parts of the brain. The authors conducted experiments on three types of fMRI data in different resting states, including 100 normal people, 15 subjects participating in anesthesia experiments (including three different states before anesthesia, during anesthesia and recovery), and 22 subjects with chronic disorders of consciousness (DOC). This article uses integrated information decomposition to obtain synergistic information and redundant information, and uses the revised integrated information value <math>\Phi_R</math> to calculate the synergy and redundancy values between each two brain regions, so as to obtain whether the factor that each brain region plays a greater role is synergy or redundancy. At the same time, by comparing the data of conscious people, they found that the regions where the integrated information of unconscious people was significantly reduced all belonged to the brain regions where synergistic information played a greater role. At the same time, they found that the regions where the integrated information was significantly reduced all belonged to functional regions such as DMN (Default Mode Network), thus locating the brain regions that have a significant effect on the occurrence of consciousness. |
| | | |
| | | |
第669行: |
第669行: |
| | | |
| ====Application of effective information in causal machine learning==== | | ====Application of effective information in causal machine learning==== |
− | Causal emergence can enhance the performance of machine learning in out-of-distribution scenarios. The do-intervention introduced in <math>EI</math> captures the causal dependence in the data generation process and suppresses spurious correlations, thus supplementing machine learning algorithms based on associations and establishing a connection between <math>EI</math> and out-of-distribution generalization (Out Of Distribution, abbreviated as OOD) [56]. Due to the universality of effective information, causal emergence can be applied to supervised machine learning to evaluate the strength of the causal relationship between the feature space <math>X</math> and the target space <math>Y</math>, thereby improving the prediction accuracy from cause (feature) to result (target). It is worth noting that direct fitting of observations from <math>X</math> to <math>Y</math> is sufficient for common prediction tasks with the i.i.d. assumption, which means that the training data and test data are independently and identically distributed. However, if samples are drawn from outside the training distribution, a generalization representation space from training to test environments must be learned. Since it is generally believed that the generalization of causality is better than statistical correlation [57], therefore, the causal emergence theory can serve as a standard for embedding causal relationships in the representation space. The occurrence of causal emergence reveals the potential causal factors of the target, thereby producing a robust representation space for out-of-distribution generalization. Causal emergence may provide a unified representation measure for out-of-distribution generalization based on causal theory. <math>EI</math> can also be regarded as an information-theoretic abstraction of the out-of-distribution generalization's reweighting-based debiasing technique. In addition, we conjecture that out-of-distribution generalization can be achieved while maximizing <math>EI</math>, and <math>EI</math> may reach its peak at the intermediate stage of the original feature abstraction, which is consistent with the idea of OOD generalization, that is, less is more. Ideally, when causal emergence occurs at the peak of <math>EI</math>, all non-causal features are excluded and causal features are revealed, resulting in the most informative representation. | + | Causal emergence can enhance the performance of machine learning in out-of-distribution scenarios. The do-intervention introduced in <math>EI</math> captures the causal dependence in the data generation process and suppresses spurious correlations, thus supplementing machine learning algorithms based on associations and establishing a connection between <math>EI</math> and out-of-distribution generalization (Out Of Distribution, abbreviated as OOD) <ref name="Emergence_and_causality_in_complex_systems">{{cite journal|author1=Yuan, B|author2=Zhang, J|author3=Lyu, A|author4=Wu, J|author5=Wang, Z|author6=Yang, M|author7=Liu, K|author8=Mou, M|author9=Cui, P|title=Emergence and causality in complex systems: A survey of causal emergence and related quantitative studies|journal=Entropy|year=2024|volume=26|issue=2|page=108|url=https://www.mdpi.com/1099-4300/26/2/108}}</ref>. Due to the universality of effective information, causal emergence can be applied to supervised machine learning to evaluate the strength of the causal relationship between the feature space <math>X</math> and the target space <math>Y</math>, thereby improving the prediction accuracy from cause (feature) to result (target). It is worth noting that direct fitting of observations from <math>X</math> to <math>Y</math> is sufficient for common prediction tasks with the i.i.d. assumption, which means that the training data and test data are independently and identically distributed. However, if samples are drawn from outside the training distribution, a generalization representation space from training to test environments must be learned. Since it is generally believed that the generalization of causality is better than statistical correlation <ref>Arjovsky, M.; Bottou, L.; Gulrajani, I.; Lopez-Paz, D. Invariant risk minimization. arXiv 2019, arXiv:1907.02893.</ref>, therefore, the causal emergence theory can serve as a standard for embedding causal relationships in the representation space. The occurrence of causal emergence reveals the potential causal factors of the target, thereby producing a robust representation space for out-of-distribution generalization. Causal emergence may provide a unified representation measure for out-of-distribution generalization based on causal theory. <math>EI</math> can also be regarded as an information-theoretic abstraction of the out-of-distribution generalization's reweighting-based debiasing technique. In addition, we conjecture that out-of-distribution generalization can be achieved while maximizing <math>EI</math>, and <math>EI</math> may reach its peak at the intermediate stage of the original feature abstraction, which is consistent with the idea of OOD generalization, that is, less is more. Ideally, when causal emergence occurs at the peak of <math>EI</math>, all non-causal features are excluded and causal features are revealed, resulting in the most informative representation. |
| | | |
| | | |
第676行: |
第676行: |
| | | |
| | | |
− | Causal model abstraction belongs to a subfield of artificial intelligence and plays an important role especially in causal inference and model interpretability. This abstraction can help us better understand the hidden causal mechanisms in the data and the interactions between variables. Causal model abstraction is achieved by evaluating the optimization of a high-level model to simulate the causal effects of a low-level model as much as possible [58]. If a high-level model can generalize the causal effects of a low-level model, we call this high-level model a causal abstraction of the low-level model. | + | Causal model abstraction belongs to a subfield of artificial intelligence and plays an important role especially in causal inference and model interpretability. This abstraction can help us better understand the hidden causal mechanisms in the data and the interactions between variables. Causal model abstraction is achieved by evaluating the optimization of a high-level model to simulate the causal effects of a low-level model as much as possible <ref>Beckers, Sander, and Joseph Y. Halpern. "Abstracting causal models." Proceedings of the aaai conference on artificial intelligence. Vol. 33. No. 01. 2019.</ref>. If a high-level model can generalize the causal effects of a low-level model, we call this high-level model a causal abstraction of the low-level model. |
| | | |
| | | |
− | Causal model abstraction also discusses the interaction between causal relationships and model abstraction (which can be regarded as a coarse-graining process) [59]. Therefore, causal emergence identification and causal model abstraction have many similarities. The original causal mechanism can be understood as microscopic dynamics, and the abstracted mechanism can be understood as macroscopic dynamics. In the neural information compression framework (NIS), researchers place restrictions on coarse-graining strategies and macroscopic dynamics, requiring that the microscopic prediction error of macroscopic dynamics be small enough to exclude trivial solutions. This requirement is also similar to causal model abstraction, which hopes that the abstracted causal model is as similar as possible to the original model. However, there are also some differences between the two: 1) Causal emergence identification is to coarse-grain states or data, while causal model abstraction is to perform coarse-graining operations on models; 2) Causal model abstraction considers confounding factors, but this point is ignored in the discussion of causal emergence identification. | + | Causal model abstraction also discusses the interaction between causal relationships and model abstraction (which can be regarded as a coarse-graining process) <ref>S. Beckers, F. Eberhardt, J. Y. Halpern, Approximate causal abstractions, in: Uncertainty in artificial intelligence, PMLR, 2020, pp. 606–615.</ref>. Therefore, causal emergence identification and causal model abstraction have many similarities. The original causal mechanism can be understood as microscopic dynamics, and the abstracted mechanism can be understood as macroscopic dynamics. In the neural information compression framework (NIS), researchers place restrictions on coarse-graining strategies and macroscopic dynamics, requiring that the microscopic prediction error of macroscopic dynamics be small enough to exclude trivial solutions. This requirement is also similar to causal model abstraction, which hopes that the abstracted causal model is as similar as possible to the original model. However, there are also some differences between the two: 1) Causal emergence identification is to coarse-grain states or data, while causal model abstraction is to perform coarse-graining operations on models; 2) Causal model abstraction considers confounding factors, but this point is ignored in the discussion of causal emergence identification. |
| | | |
| | | |
| =====Reinforcement learning based on world models===== | | =====Reinforcement learning based on world models===== |
− | Reinforcement learning based on world models assumes that there is a world model inside the reinforcement learning agent, so that it can simulate the dynamics of the environment faced by the intelligent agent [60]. The dynamics of the world model can be learned through the interaction between the agent and the environment, thereby helping the agent to plan and make decisions in an uncertain environment. At the same time, in order to represent a complex environment, the world model must be a coarse-grained description of the environment. A typical world model architecture always contains an encoder and a decoder. | + | Reinforcement learning based on world models assumes that there is a world model inside the reinforcement learning agent, so that it can simulate the dynamics of the environment faced by the intelligent agent <ref>D. Ha, J. Schmidhuber, World models, arXiv preprint arXiv:1803.10122 (2018).</ref>. The dynamics of the world model can be learned through the interaction between the agent and the environment, thereby helping the agent to plan and make decisions in an uncertain environment. At the same time, in order to represent a complex environment, the world model must be a coarse-grained description of the environment. A typical world model architecture always contains an encoder and a decoder. |
| | | |
| | | |
第720行: |
第720行: |
| | | |
| | | |
− | For example, Yurchenko pointed out in the literature [61] that the concept of "causation" is often ambiguous and should be distinguished into two different concepts of "cause" and "reason", which respectively conform to ontological and epistemological causality. Among them, cause refers to the real cause that fully leads to the result, while reason is only the observer's explanation of the result. Reason may not be as strict as a real cause, but it does provide a certain degree of predictability. Similarly, there is also a debate about the nature of causal emergence. | + | For example, Yurchenko pointed out in the literature <ref>Yurchenko, S. B. (2023). Can there be a synergistic core emerging in the brain hierarchy to control neural activity by downward causation?. Authorea Preprints.</ref> that the concept of "causation" is often ambiguous and should be distinguished into two different concepts of "cause" and "reason", which respectively conform to ontological and epistemological causality. Among them, cause refers to the real cause that fully leads to the result, while reason is only the observer's explanation of the result. Reason may not be as strict as a real cause, but it does provide a certain degree of predictability. Similarly, there is also a debate about the nature of causal emergence. |
| | | |
| | | |
第726行: |
第726行: |
| | | |
| | | |
− | Dewhurst [62] provides a philosophical clarification of Hoel's theory, arguing that it is epistemological rather than ontological. This indicates that Hoel's macroscopic causality is only a causal explanation based on information theory and does not involve "true causality". This also raises questions about the assumption of uniform distribution (see the entry for effective information), as there is no evidence that it should be superior to other distributions. | + | Dewhurst <ref>Dewhurst, J. (2021). Causal emergence from effective information: Neither causal nor emergent?. Thought: A Journal of Philosophy, 10(3), 158-168.</ref> provides a philosophical clarification of Hoel's theory, arguing that it is epistemological rather than ontological. This indicates that Hoel's macroscopic causality is only a causal explanation based on information theory and does not involve "true causality". This also raises questions about the assumption of uniform distribution (see the entry for effective information), as there is no evidence that it should be superior to other distributions. |
| | | |
| | | |
第732行: |
第732行: |
| | | |
| | | |
− | At the same time, it is pointed out that Hoel's theory ignores the constraints on the coarse-graining method, and some coarse-graining methods may lead to ambiguity [63]. In addition, some combinations of state coarse-graining operations and time coarse-graining operations do not exhibit commutativity. For example, assume that <math>A_{m\times n}</math> is a state coarse-graining operation (combining n states into m states). Here, the coarse-graining strategy is the strategy that maximizes the effective information of the macroscopic state transition matrix. <math>(\cdot) \times (\cdot)</math> is a time coarse-graining operation (combining two time steps into one). In this way, [math]A_{m\times n}(TPM_{n\times n})[/math] is to perform coarse-graining on a [math]n\times n[/math] TPM, and the coarse-graining process is simplified as the product of matrix [math]A[/math] and matrix [math]TPM[/math]. | + | At the same time, it is pointed out that Hoel's theory ignores the constraints on the coarse-graining method, and some coarse-graining methods may lead to ambiguity <ref>Eberhardt, F., & Lee, L. L. (2022). Causal emergence: When distortions in a map obscure the territory. Philosophies, 7(2), 30.</ref>. In addition, some combinations of state coarse-graining operations and time coarse-graining operations do not exhibit commutativity. For example, assume that <math>A_{m\times n}</math> is a state coarse-graining operation (combining n states into m states). Here, the coarse-graining strategy is the strategy that maximizes the effective information of the macroscopic state transition matrix. <math>(\cdot) \times (\cdot)</math> is a time coarse-graining operation (combining two time steps into one). In this way, [math]A_{m\times n}(TPM_{n\times n})[/math] is to perform coarse-graining on a [math]n\times n[/math] TPM, and the coarse-graining process is simplified as the product of matrix [math]A[/math] and matrix [math]TPM[/math]. |
| | | |
| | | |
第748行: |
第748行: |
| | | |
| | | |
− | However, as pointed out in the literature [40], the above problem can be alleviated by considering the error factor of the model while maximizing EI in the continuous variable space. However, although machine learning techniques facilitate the learning of causal relationships and causal mechanisms and the identification of emergent properties, it is important whether the results obtained through machine learning reflect ontological causality and emergence, or are they just an epistemological phenomenon? This is still undecided. Although the introduction of machine learning does not necessarily solve the debate around ontological and epistemological causality and emergence, it can provide a dependence that helps reduce subjectivity. This is because the machine learning agent can be regarded as an "objective" observer who makes judgments about causality and emergence that are independent of human observers. However, the problem of a unique solution still exists in this method. Is the result of machine learning ontological or epistemological? The answer is that the result is epistemological, where the epistemic subject is the machine learning algorithm. However, this does not mean that all results of machine learning are meaningless, because if the learning subject is well trained and the defined mathematical objective is effectively optimized, then the result can also be considered objective because the algorithm itself is objective and transparent. Combining machine learning methods can help us establish a theoretical framework for observers and study the interaction between observers and the corresponding observed complex systems. | + | However, as pointed out in the literature <ref name=":6" />, the above problem can be alleviated by considering the error factor of the model while maximizing EI in the continuous variable space. However, although machine learning techniques facilitate the learning of causal relationships and causal mechanisms and the identification of emergent properties, it is important whether the results obtained through machine learning reflect ontological causality and emergence, or are they just an epistemological phenomenon? This is still undecided. Although the introduction of machine learning does not necessarily solve the debate around ontological and epistemological causality and emergence, it can provide a dependence that helps reduce subjectivity. This is because the machine learning agent can be regarded as an "objective" observer who makes judgments about causality and emergence that are independent of human observers. However, the problem of a unique solution still exists in this method. Is the result of machine learning ontological or epistemological? The answer is that the result is epistemological, where the epistemic subject is the machine learning algorithm. However, this does not mean that all results of machine learning are meaningless, because if the learning subject is well trained and the defined mathematical objective is effectively optimized, then the result can also be considered objective because the algorithm itself is objective and transparent. Combining machine learning methods can help us establish a theoretical framework for observers and study the interaction between observers and the corresponding observed complex systems. |
| | | |
| | | |
第756行: |
第756行: |
| | | |
| ===Reduction of dynamical models=== | | ===Reduction of dynamical models=== |
− | An important indicator of causal emergence is the selection of coarse-graining strategies. When the microscopic model is known, coarse-graining the microscopic state is equivalent to performing '''model reduction''' on the microscopic model. Model reduction is an important subfield in control theory. Antoulas once wrote a related review article [64]. | + | An important indicator of causal emergence is the selection of coarse-graining strategies. When the microscopic model is known, coarse-graining the microscopic state is equivalent to performing '''model reduction''' on the microscopic model. Model reduction is an important subfield in control theory. Antoulas once wrote a related review article <ref name=":15">Antoulas A C. An overview of approximation methods for large-scale dynamical systems[J]. Annual reviews in Control, 2005, 29(2): 181-190.</ref>. |
| | | |
| | | |
− | Model reduction is to simplify and reduce the dimension of the high-dimensional complex system dynamics model and describe the evolution law of the original system with low-dimensional dynamics. This process is actually the coarse-graining process in the study of causal emergence. There are mainly two types of approximation methods for large-scale dynamical systems, namely approximation methods based on singular value decomposition [64][65] and approximation methods based on Krylov [64][66][67]. The former is based on singular value decomposition, and the latter is based on moment matching. Although the former has many ideal properties, including error bounds, it cannot be applied to systems with high complexity. On the other hand, the advantage of the latter is that it can be implemented iteratively and is therefore suitable for high-dimensional complex systems. Combining the advantages of these two methods gives rise to a third type of approximation method, namely the SVD/Krylov method [68][69]. Both methods evaluate the model reduction effect based on the error loss function of the output function before and after coarse-graining. Therefore, the goal of model reduction is to find the reduced parameter matrix that minimizes the error. | + | Model reduction is to simplify and reduce the dimension of the high-dimensional complex system dynamics model and describe the evolution law of the original system with low-dimensional dynamics. This process is actually the coarse-graining process in the study of causal emergence. There are mainly two types of approximation methods for large-scale dynamical systems, namely approximation methods based on singular value decomposition <ref name=":15" /><ref>Gallivan K, Grimme E, Van Dooren P. Asymptotic waveform evaluation via a Lanczos method[J]. Applied Mathematics Letters, 1994, 7(5): 75-80.</ref> and approximation methods based on Krylov <ref name=":15" /><ref name=":17">CHRISTIAN DE VILLEMAGNE & ROBERT E. SKELTON (1987) Model reductions using a projection formulation, International Journal of Control, 46:6, 2141-2169, DOI: 10.1080/00207178708934040 </ref><ref>Boley D L. Krylov space methods on state-space control models[J]. Circuits, Systems and Signal Processing, 1994, 13: 733-758.</ref>. The former is based on singular value decomposition, and the latter is based on moment matching. Although the former has many ideal properties, including error bounds, it cannot be applied to systems with high complexity. On the other hand, the advantage of the latter is that it can be implemented iteratively and is therefore suitable for high-dimensional complex systems. Combining the advantages of these two methods gives rise to a third type of approximation method, namely the SVD/Krylov method <ref>Gugercin S. An iterative SVD-Krylov based method for model reduction of large-scale dynamical systems[J]. Linear Algebra and its Applications, 2008, 428(8-9): 1964-1986.</ref><ref>Khatibi M, Zargarzadeh H, Barzegaran M. Power system dynamic model reduction by means of an iterative SVD-Krylov model reduction method[C]//2016 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT). IEEE, 2016: 1-6.</ref>. Both methods evaluate the model reduction effect based on the error loss function of the output function before and after coarse-graining. Therefore, the goal of model reduction is to find the reduced parameter matrix that minimizes the error. |
| | | |
| | | |
− | In general, the error loss function of the output function before and after model reduction can be used to judge the coarse-graining parameters. This process defaults that the system reduction process will lose information. Therefore, minimizing the error is the only way to judge the effectiveness of the reduction method. However, from the perspective of causal emergence, effective information will increase due to dimensionality reduction. This is also the biggest difference between the coarse-graining strategy in causal emergence research and model reduction in control theory. When the dynamical system is a stochastic system [70], directly calculating the loss function will lead to unstable due to the existence of randomness, so the effectiveness of reduction cannot be accurately measured. The effective information and causal emergence index based on stochastic dynamical systems can increase the effectiveness of evaluation indicators to a certain extent and make the control research of stochastic dynamical systems more rigorous. | + | In general, the error loss function of the output function before and after model reduction can be used to judge the coarse-graining parameters. This process defaults that the system reduction process will lose information. Therefore, minimizing the error is the only way to judge the effectiveness of the reduction method. However, from the perspective of causal emergence, effective information will increase due to dimensionality reduction. This is also the biggest difference between the coarse-graining strategy in causal emergence research and model reduction in control theory. When the dynamical system is a stochastic system <ref name=":17" />, directly calculating the loss function will lead to unstable due to the existence of randomness, so the effectiveness of reduction cannot be accurately measured. The effective information and causal emergence index based on stochastic dynamical systems can increase the effectiveness of evaluation indicators to a certain extent and make the control research of stochastic dynamical systems more rigorous. |
| | | |
| | | |
| ===Dynamic mode decomposition=== | | ===Dynamic mode decomposition=== |
− | In addition to the reduction of dynamical models, dynamic mode decomposition is also closely related to coarse-graining. The basic idea of the dynamic mode decomposition (DMD) [71][72] model is to directly obtain the dynamic information of the flow in the flow field from the data and find the data mapping according to the flow field changes of different frequencies. This method is based on transforming nonlinear infinite-dimensional dynamics into finite-dimensional linear dynamics, and adopts the ideas of Arnoldi method and singular value decomposition for dimensionality reduction. It draws on many key features of time series such as ARIMA, SARIMA and seasonal models, and is widely used in fields such as mathematics, physics, and finance [73]. Dynamic mode decomposition sorts the system according to frequency and extracts the eigenfrequency of the system to observe the contribution of flow structures of different frequencies to the flow field. At the same time, the dynamic mode decomposition modal eigenvalue can predict the flow field. Because the dynamic mode decomposition algorithm has the advantages of theoretical rigor, stability, and simplicity. While being continuously applied, the dynamic mode decomposition algorithm is also continuously improved on the original basis. For example, it is combined with the SPA test to verify the strong effectiveness of the stock price prediction comparison benchmark point and by connecting the dynamic mode decomposition algorithm and spectral research. The way to simulate the vibration mode of the stock market in the circular economy. These applications can effectively collect and analyze data and finally obtain results. | + | In addition to the reduction of dynamical models, dynamic mode decomposition is also closely related to coarse-graining. The basic idea of the dynamic mode decomposition (DMD) <ref>Schmid P J. Dynamic mode decomposition and its variants[J]. Annual Review of Fluid Mechanics, 2022, 54(1): 225-254.</ref><ref>J. Proctor, S. Brunton and J. N. Kutz, Dynamic mode decomposition with control, arXiv:1409.6358</ref> model is to directly obtain the dynamic information of the flow in the flow field from the data and find the data mapping according to the flow field changes of different frequencies. This method is based on transforming nonlinear infinite-dimensional dynamics into finite-dimensional linear dynamics, and adopts the ideas of Arnoldi method and singular value decomposition for dimensionality reduction. It draws on many key features of time series such as ARIMA, SARIMA and seasonal models, and is widely used in fields such as mathematics, physics, and finance <ref>J. Grosek and J. N. Kutz, Dynamic mode decomposition for real-time background/foreground separation in video, arXiv:1404.7592.</ref>. Dynamic mode decomposition sorts the system according to frequency and extracts the eigenfrequency of the system to observe the contribution of flow structures of different frequencies to the flow field. At the same time, the dynamic mode decomposition modal eigenvalue can predict the flow field. Because the dynamic mode decomposition algorithm has the advantages of theoretical rigor, stability, and simplicity. While being continuously applied, the dynamic mode decomposition algorithm is also continuously improved on the original basis. For example, it is combined with the SPA test to verify the strong effectiveness of the stock price prediction comparison benchmark point and by connecting the dynamic mode decomposition algorithm and spectral research. The way to simulate the vibration mode of the stock market in the circular economy. These applications can effectively collect and analyze data and finally obtain results. |
| | | |
| | | |
− | Dynamic mode decomposition is a method of reducing the dimension of variables, dynamics, and observation functions simultaneously by using linear transformation [74]. This method is another method similar to the coarse-graining strategy in causal emergence, which takes minimizing error as the main goal for optimization. Although both model reduction and dynamic mode decomposition are very close to model coarse-graining, they are not optimized based on effective information. In essence, they both default to a certain degree of information loss and will not enhance causal effects. In the literature [75], the authors proved that in fact the error minimization solution set contains the optimal solution set of maximizing effective information. Therefore, if we want to optimize causal emergence, we can first minimize the error and find the best coarse-graining strategy in the error minimization solution set. | + | Dynamic mode decomposition is a method of reducing the dimension of variables, dynamics, and observation functions simultaneously by using linear transformation <ref>B. Brunton, L. Johnson, J. Ojemann and J. N. Kutz, Extracting spatial-temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition arXiv:1409.5496</ref>. This method is another method similar to the coarse-graining strategy in causal emergence, which takes minimizing error as the main goal for optimization. Although both model reduction and dynamic mode decomposition are very close to model coarse-graining, they are not optimized based on effective information. In essence, they both default to a certain degree of information loss and will not enhance causal effects. In the literature <ref>Liu K, Yuan B, Zhang J. An Exact Theory of Causal Emergence for Linear Stochastic Iteration Systems[J]. arXiv preprint arXiv:2405.09207, 2024.</ref>, the authors proved that in fact the error minimization solution set contains the optimal solution set of maximizing effective information. Therefore, if we want to optimize causal emergence, we can first minimize the error and find the best coarse-graining strategy in the error minimization solution set. |
| | | |
| | | |
| ===Simplification of Markov chains=== | | ===Simplification of Markov chains=== |
− | The simplification of Markov chains (or called coarse-graining of Markov chains) is also importantly related to causal emergence. The coarse-graining process in causal emergence is essentially the simplification of Markov chains. Model simplification of Markov processes [76] is an important problem in state transition system modeling. It reduces the complexity of Markov chains by merging multiple states into one state. | + | The simplification of Markov chains (or called coarse-graining of Markov chains) is also importantly related to causal emergence. The coarse-graining process in causal emergence is essentially the simplification of Markov chains. Model simplification of Markov processes <ref>Zhang A, Wang M. Spectral state compression of markov processes[J]. IEEE transactions on information theory, 2019, 66(5): 3202-3231.</ref> is an important problem in state transition system modeling. It reduces the complexity of Markov chains by merging multiple states into one state. |
| | | |
| | | |
− | There are mainly three meanings of simplification. First, when we study a very large-scale system, we will not pay attention to the changes of each microscopic state. Therefore, in coarse-graining, we hope to filter out some noise and heterogeneity that we are not interested in, and summarize some mesoscale or macroscopic laws from the microscopic scale. Second, some state transition probabilities are very similar, so they can be regarded as the same kind of state. Clustering this kind of state (also called partitioning the state) to obtain a new smaller Markov chain can reduce the redundancy of system representation. Third, in reinforcement learning using Markov decision processes, coarse-graining the Markov chain can reduce the size of the state space and improve training efficiency. In many literatures, coarse-graining and dimension reduction are equivalent [77]. | + | There are mainly three meanings of simplification. First, when we study a very large-scale system, we will not pay attention to the changes of each microscopic state. Therefore, in coarse-graining, we hope to filter out some noise and heterogeneity that we are not interested in, and summarize some mesoscale or macroscopic laws from the microscopic scale. Second, some state transition probabilities are very similar, so they can be regarded as the same kind of state. Clustering this kind of state (also called partitioning the state) to obtain a new smaller Markov chain can reduce the redundancy of system representation. Third, in reinforcement learning using Markov decision processes, coarse-graining the Markov chain can reduce the size of the state space and improve training efficiency. In many literatures, coarse-graining and dimension reduction are equivalent <ref>Coarse graining. ''Encyclopedia of Mathematics.'' URL: <nowiki>http://encyclopediaofmath.org/index.php?title=Coarse_graining&oldid=16170</nowiki></ref>. |
| | | |
| | | |
第788行: |
第788行: |
| | | |
| | | |
− | For any hard partition of states, we can define the so-called concept of lumpability. Lumpability is a measure of clustering. This concept first appeared in Kemeny, Snell's Finite Markov Chains in 1969 [78]. Lumpability is a mathematical condition used to judge "whether a certain hard-blocked microscopic state grouping scheme is reducible to the microscopic state transition matrix". No matter which hard-blocking scheme the state space is classified according to, it has a corresponding coarse-graining scheme for the transition matrix and probability space [79]. | + | For any hard partition of states, we can define the so-called concept of lumpability. Lumpability is a measure of clustering. This concept first appeared in Kemeny, Snell's Finite Markov Chains in 1969 <ref name=":33">Kemeny, John G., and J. Laurie Snell. ''Finite markov chains''. Vol. 26. Princeton, NJ: van Nostrand, 1969. https://www.math.pku.edu.cn/teachers/yaoy/Fall2011/Kemeny-Snell_Chapter6.3-4.pdf</ref>. Lumpability is a mathematical condition used to judge "whether a certain hard-blocked microscopic state grouping scheme is reducible to the microscopic state transition matrix". No matter which hard-blocking scheme the state space is classified according to, it has a corresponding coarse-graining scheme for the transition matrix and probability space <ref>Buchholz, Peter. "Exact and ordinary lumpability in finite Markov chains." ''Journal of applied probability'' 31.1 (1994): 59-75.</ref>. |
| | | |
| | | |
− | Suppose a grouping method '''<math>A=\{A_1, A_2,...,A_r\}</math>''' is given for the Markov state space '''<math>A</math>'''. Here [math]A_i[/math] is any subset of the state space '''<math>A</math>''' and satisfies [math]A_i\cap A_j= \Phi[/math], where [math]\Phi[/math] represents the empty set. [math]\displaystyle{ s^{(t)} }[/math] represents the microscopic state of the system at time [math]\displaystyle{ t }[/math]. The microscopic state space is [math]\displaystyle{ S=\{s_1, s_2,...,s_n\} }[/math] and the microscopic state '''<math>s_i\in A</math>''' are all independent elements in the Markov state space. Let the transition probability from microscopic state <math>s_k</math> to <math>s_m</math> be <math>p_{s_k \rightarrow s_m} = p(s^{(t)} = s_m | s^{(t-1)} = s_k)</math>, and the transition probability from microscopic state <math>s_k</math> to macroscopic state <math>A_i</math> be <math>p_{s_k \rightarrow A_i} = p(s^{(t)} \in A_i | s^{(t-1)} = s_k)</math>. Then the necessary and sufficient condition for lumpability is that for any pair <math>A_i, A_j</math>, <math>p_{s_k \rightarrow A_j}</math> of every state <math>s_k</math> belonging to <math>A_i</math> is equal, that is {{NumBlk|:| | + | Suppose a grouping method '''<math>A=\{A_1, A_2,...,A_r\}</math>''' is given for the Markov state space '''<math>A</math>'''. Here [math]A_i[/math] is any subset of the state space '''<math>A</math>''' and satisfies [math]A_i\cap A_j= \Phi[/math], where [math]\Phi[/math] represents the empty set. [math]\displaystyle{ s^{(t)} }[/math] represents the microscopic state of the system at time [math]\displaystyle{ t }[/math]. The microscopic state space is [math]\displaystyle{ S=\{s_1, s_2,...,s_n\} }[/math] and the microscopic state '''<math>s_i\in A</math>''' are all independent elements in the Markov state space. Let the transition probability from microscopic state <math>s_k</math> to <math>s_m</math> be <math>p_{s_k \rightarrow s_m} = p(s^{(t)} = s_m | s^{(t-1)} = s_k)</math>, and the transition probability from microscopic state <math>s_k</math> to macroscopic state <math>A_i</math> be <math>p_{s_k \rightarrow A_i} = p(s^{(t)} \in A_i | s^{(t-1)} = s_k)</math>. Then the necessary and sufficient condition for lumpability is that for any pair <math>A_i, A_j</math>, <math>p_{s_k \rightarrow A_j}</math> of every state <math>s_k</math> belonging to <math>A_i</math> is equal, that is: |
| + | |
| + | |
| + | {{NumBlk|:| |
| <math> | | <math> |
| \begin{aligned} | | \begin{aligned} |
第797行: |
第800行: |
| \end{aligned} | | \end{aligned} |
| </math> | | </math> |
− | |{{EquationRef|4}}}}For specific methods of coarse-graining Markov chains, please refer to coarse-graining of Markov chains. | + | |{{EquationRef|4}}}} |
| + | |
| + | |
| + | For specific methods of coarse-graining Markov chains, please refer to coarse-graining of Markov chains. |