更改

Causal Emergence (查看源代码)

2024年11月6日 (三) 12:44的版本

添加237字节、 2024年11月6日 (星期三)

第344行：第344行：

==Identification of Causal Emergence==

−

Some works on quantifying emergence through causal measures and other information-theoretic indicators have been introduced previously. However, in practical applications, we can often only collect observational data and cannot obtain the true dynamics of the system. Therefore, identifying whether causal emergence has occurred in a system from observable data is a more important problem. The following introduces two identification methods of causal emergence, including the approximate method based on Rosas causal emergence theory (the method based on mutual information approximation and the method based on machine learning) and the neural information compression (NIS, NIS+) method proposed by Chinese scholars.

+

Some works on quantifying emergence through causal measures and other information-theoretic indicators have been introduced previously. However, in practical applications, we can often only collect observational data and cannot obtain the true dynamics of the system. Therefore, identifying whether causal emergence has occurred in a system from observable data is a more important problem. The following introduces two identification methods of causal emergence, including the approximate method based on [[Rosas causal emergence theory]] (the method based on mutual information approximation and the method based on [[machine learning]]) and the [[neural information compression]] (NIS, NIS+) method proposed by Chinese scholars.

====Approximate Method Based on Rosas Causal Emergence Theory====

−

Rosas's causal emergence theory includes a quantification method based on synergistic information and a quantification method based on unique information. The second method can bypass the combinatorial explosion problem of multivariate variables, but it depends on the coarse-graining method and the selection of macroscopic state variable <math>V</math>. To solve this problem, the author gives two solutions. One is to specify a macroscopic state <math>V</math> by the researcher, and the other is a machine learning-based method that allows the system to automatically learn the macroscopic state variable <math>V</math> by maximizing <math>\mathrm{\Psi}</math>. Now we introduce these two methods respectively:

+

[[Rosas's causal emergence theory]] includes a quantification method based on [[synergistic information]] and a quantification method based on [[unique information]]. The second method can bypass the combinatorial explosion problem of multivariate variables, but it depends on the coarse-graining method and the selection of macroscopic state variable <math>V</math>. To solve this problem, the author gives two solutions. One is to specify a macroscopic state <math>V</math> by the researcher, and the other is a machine learning-based method that allows the system to automatically learn the macroscopic state variable <math>V</math> by maximizing <math>\mathrm{\Psi}</math>. Now we introduce these two methods respectively:

=====Method Based on Mutual Information Approximation=====

−

Although Rosas's causal emergence theory has given a strict definition of causal emergence, it involves the combinatorial explosion problem of many variables in the calculation, so it is difficult to apply this method to actual systems. To solve this problem, Rosas et al. bypassed the exact calculation of unique information and synergistic information <ref name=":5" /> and proposed an approximate formula that only needs to calculate mutual information, and derived a sufficient condition for determining the occurrence of causal emergence.

+

Although [[Rosas's causal emergence theory]] has given a strict definition of causal emergence, it involves the combinatorial explosion problem of many variables in the calculation, so it is difficult to apply this method to actual systems. To solve this problem, Rosas et al. bypassed the exact calculation of unique information and synergistic information <ref name=":5" /> and proposed an approximate formula that only needs to calculate [[mutual information]], and derived a sufficient condition for determining the occurrence of causal emergence.

−

The authors proposed three new indicators based on mutual information, <math>\mathrm{\Psi}</math>, <math>\mathrm{\Delta}</math> and <math>\mathrm{\Gamma}</math>, which can be used to identify causal emergence, causal decoupling and downward causation in the system respectively. The specific calculation formulas of the three indicators are as follows:

+

The authors proposed three new indicators based on [[mutual information]], <math>\mathrm{\Psi}</math>, <math>\mathrm{\Delta}</math> and <math>\mathrm{\Gamma}</math>, which can be used to identify causal emergence, [[causal decoupling]] and [[downward causation]] in the system respectively. The specific calculation formulas of the three indicators are as follows:

第367行：第367行： −

* Indicator for judging downward causation:

+

* Indicator for judging [[downward causation]]:

第376行：第376行： −

* Indicator for judging causal decoupling:

+

* Indicator for judging [[causal decoupling]]:

第394行：第394行： −

In summary, this method is relatively convenient to calculate because it is based on mutual information, and there is no assumption or requirement of Markov property for the dynamics of the system. However, this theory also has many shortcomings: 1) The three indicators proposed by this method: <math>\mathrm{\Psi}</math>, <math>\mathrm{\Delta}</math> and <math>\mathrm{\Gamma}</math> are only calculations based on mutual information and do not consider causality; 2) The method only obtains a sufficient condition for the occurrence of causal emergence; 3) This method depends on the selection of macroscopic variables, and different choices will have significantly different effects on the results; 4) When the system has a large amount of redundant information or many variables, the computational complexity of this method will be very high. At the same time, since <math>\Psi</math> is an approximate calculation, there will be a very large error in high-dimensional systems, and it is also very easy to obtain negative values, so it is impossible to judge whether there is causal emergence.

+

In summary, this method is relatively convenient to calculate because it is based on mutual information, and there is no assumption or requirement of Markov property for the dynamics of the system. However, this theory also has many shortcomings: 1) The three indicators proposed by this method: <math>\mathrm{\Psi}</math>, <math>\mathrm{\Delta}</math> and <math>\mathrm{\Gamma}</math> are only calculations based on [[mutual information]] and do not consider causality; 2) The method only obtains a sufficient condition for the occurrence of causal emergence; 3) This method depends on the selection of macroscopic variables, and different choices will have significantly different effects on the results; 4) When the system has a large amount of redundant information or many variables, the [[computational complexity]] of this method will be very high. At the same time, since <math>\Psi</math> is an approximate calculation, there will be a very large error in high-dimensional systems, and it is also very easy to obtain negative values, so it is impossible to judge whether there is causal emergence.

−

To verify that the information related to macaque movement is an emergent feature of its cortical activity, Rosas et al. did the following experiment: Using the electrocorticogram (ECoG) of macaques as the observational data of microscopic dynamics. To obtain the macroscopic state variable <math>V</math>, the authors chose the time series data of the limb movement trajectory of macaques obtained by motion capture (MoCap), where ECoG and MoCap are composed of data from 64 channels and 3 channels respectively. Since the original MoCap data does not satisfy the conditional independence assumption of the supervenience feature, they used partial least squares and support vector machine algorithms to infer the part of neural activity encoded in the ECoG signal related to predicting macaque behavior, and speculated that this information is an emergent feature of potential neural activity. Finally, based on the microscopic state and the calculated macroscopic features, the authors verified the existence of causal emergence.

+

To verify that the information related to macaque movement is an emergent feature of its cortical activity, Rosas et al. did the following experiment: Using the electrocorticogram (ECoG) of macaques as the observational data of microscopic dynamics. To obtain the macroscopic state variable <math>V</math>, the authors chose the time series data of the limb movement trajectory of macaques obtained by motion capture (MoCap), where ECoG and MoCap are composed of data from 64 channels and 3 channels respectively. Since the original MoCap data does not satisfy the conditional independence assumption of the supervenience feature, they used [[partial least squares]] and [[support vector machine]] algorithms to infer the part of neural activity encoded in the ECoG signal related to predicting macaque behavior, and speculated that this information is an emergent feature of potential neural activity. Finally, based on the microscopic state and the calculated macroscopic features, the authors verified the existence of causal emergence.

=====Machine Learning-based Method=====

−

Kaplanis et al. <ref name=":2" /> based on the theoretical method of representation learning, use an algorithm to spontaneously learn the macroscopic state variable <math>V</math> by maximizing <math>\mathrm{\Psi}</math> (i.e., Equation {{EquationNote|1}}). Specifically, the authors use a neural network <math>f_{\theta}</math> to learn the representation function that coarsens the microscopic input <math>X_t</math> into the macroscopic output <math>V_t</math>, and at the same time use neural networks <math>g_{\phi}</math> and <math>h_{\xi}</math> to learn the calculation of mutual information such as <math>I(V_t;V_{t + 1})</math> and <math>\sum_i(I(V_{t + 1};X_{t}^i))</math> respectively. Finally, this method optimizes the neural network by maximizing the difference between the two (i.e., <math>\mathrm{\Psi}</math>). The architecture diagram of this neural network system is shown in Figure a below.

+

Kaplanis et al. <ref name=":2" /> based on the theoretical method of [[representation learning]], use an algorithm to spontaneously learn the macroscopic state variable <math>V</math> by maximizing <math>\mathrm{\Psi}</math> (i.e., Equation {{EquationNote|1}}). Specifically, the authors use a neural network <math>f_{\theta}</math> to learn the representation function that coarsens the microscopic input <math>X_t</math> into the macroscopic output <math>V_t</math>, and at the same time use neural networks <math>g_{\phi}</math> and <math>h_{\xi}</math> to learn the calculation of mutual information such as <math>I(V_t;V_{t + 1})</math> and <math>\sum_i(I(V_{t + 1};X_{t}^i))</math> respectively. Finally, this method optimizes the neural network by maximizing the difference between the two (i.e., <math>\mathrm{\Psi}</math>). The architecture diagram of this neural network system is shown in Figure a below.

第414行：第414行：

====Neural Information Compression Method====

−

In recent years, emerging artificial intelligence technologies have overcome a series of major problems. At the same time, machine learning methods are equipped with various carefully designed neural network structures and automatic differentiation technologies, which can approximate any function in a huge function space. Therefore, Zhang Jiang et al. tried to propose a data-driven method based on neural networks to identify causal emergence from time series data <ref name="NIS">Zhang J, Liu K. Neural information squeezer for causal emergence[J]. Entropy, 2022, 25(1): 26.</ref><ref name=":6" />. This method can automatically extract effective coarse-graining strategies and macroscopic dynamics, overcoming various deficiencies of the Rosas method <ref name=":5" />.

+

In recent years, emerging [[artificial intelligence]] technologies have overcome a series of major problems. At the same time, machine learning methods are equipped with various carefully designed [[neural network]] structures and [[automatic differentiation]] technologies, which can approximate any function in a huge function space. Therefore, [[Zhang Jiang]] et al. tried to propose a data-driven method based on neural networks to identify causal emergence from time series data <ref name="NIS">Zhang J, Liu K. Neural information squeezer for causal emergence[J]. Entropy, 2022, 25(1): 26.</ref><ref name=":6" />. This method can automatically extract effective coarse-graining strategies and macroscopic dynamics, overcoming various deficiencies of the Rosas method <ref name=":5" />.

−

In this work, the input is time series data <math>(X_1,X_2,...,X_T )</math>, and <math>X_t\equiv (X_t^1,X_t^2,…,X_t^p )</math>, <math>p</math> represents the dimension of the input data. The author assumes that this set of data is generated by a general stochastic dynamical system:

+

In this work, the input is time series data <math>(X_1,X_2,...,X_T )</math>, and <math>X_t\equiv (X_t^1,X_t^2,…,X_t^p )</math>, <math>p</math> represents the dimension of the input data. The author assumes that this set of data is generated by a general [[stochastic dynamical system]]:

第436行：第436行： −

Here, [math]\mathcal{J}[/math] is the dimension-averaged <math>EI</math> (see the entry effective information), <math>\mathrm{\phi}</math> is the coarse-graining strategy function, <math>f_{q}</math> is the macroscopic dynamics, <math>q</math> is the dimension of the coarsened macroscopic state, [math]\hat{X}_{t + 1}[/math] is the prediction of the microscopic state at time <math>t + 1</math> by the entire framework. This prediction is obtained by performing inverse coarse-graining operation (the inverse coarse-graining function is [math]\phi^{\dagger}[/math]) on the macroscopic state prediction <math>\hat{Y}_{t + 1}</math> at time <math>t + 1</math>. Here [math]\hat{Y}_{t + 1}\equiv f_q(Y_t)[/math] is the prediction of the macroscopic state at time <math>t + 1</math> by the dynamics learner according to the macroscopic state [math]Y_t[/math] at time <math>t</math>, where [math]Y_t\equiv \phi(X_t)[/math] is the macroscopic state at time <math>t</math>, which is obtained by coarse-graining [math]X_t[/math] by [math]\phi[/math]. Finally, the difference between [math]\hat{X}_{t + 1}[/math] and the real microscopic state data [math]X_{t + 1}[/math] is compared to obtain the microscopic prediction error.

+

Here, [math]\mathcal{J}[/math] is the dimension-averaged <math>EI</math> (see the entry [[effective information]]), <math>\mathrm{\phi}</math> is the coarse-graining strategy function, <math>f_{q}</math> is the macroscopic dynamics, <math>q</math> is the dimension of the coarsened macroscopic state, [math]\hat{X}_{t + 1}[/math] is the prediction of the microscopic state at time <math>t + 1</math> by the entire framework. This prediction is obtained by performing inverse coarse-graining operation (the inverse coarse-graining function is [math]\phi^{\dagger}[/math]) on the macroscopic state prediction <math>\hat{Y}_{t + 1}</math> at time <math>t + 1</math>. Here [math]\hat{Y}_{t + 1}\equiv f_q(Y_t)[/math] is the prediction of the macroscopic state at time <math>t + 1</math> by the dynamics learner according to the macroscopic state [math]Y_t[/math] at time <math>t</math>, where [math]Y_t\equiv \phi(X_t)[/math] is the macroscopic state at time <math>t</math>, which is obtained by coarse-graining [math]X_t[/math] by [math]\phi[/math]. Finally, the difference between [math]\hat{X}_{t + 1}[/math] and the real microscopic state data [math]X_{t + 1}[/math] is compared to obtain the microscopic prediction error.

第449行：第449行：

=====NIS=====

−

To identify causal emergence in the system, the author proposes a neural information squeezer (NIS) neural network architecture ~~[46]~~. This architecture is based on an encoder-dynamics learner-decoder framework, that is, the model consists of three parts, which are respectively used for coarse-graining the original data to obtain the macroscopic state, fitting the macroscopic dynamics and inverse coarse-graining operation (decoding the macroscopic state combined with random noise into the microscopic state). Among them, the authors use invertible neural network (INN) to construct the encoder (Encoder) and decoder (Decoder), which approximately correspond to the coarse-graining function [math]\phi[/math] and the inverse coarse-graining function [math]\phi^{\dagger}[/math] respectively. The reason for using invertible neural network is that we can simply invert this network to obtain the inverse coarse-graining function (i.e., [math]\phi^{\dagger}\approx \phi^{-1}[/math]). This model framework can be regarded as a neural information compressor. It puts the microscopic state data containing noise into a narrow information channel, compresses it into a macroscopic state, discards useless information, so that the causality of macroscopic dynamics is stronger, and then decodes it into a prediction of the microscopic state. The model framework of the NIS method is shown in the following figure:

+

To identify causal emergence in the system, the author proposes a [[neural information squeezer]] (NIS) neural network architecture <ref name="NIS" />. This architecture is based on an encoder-dynamics learner-decoder framework, that is, the model consists of three parts, which are respectively used for coarse-graining the original data to obtain the macroscopic state, fitting the macroscopic dynamics and inverse coarse-graining operation (decoding the macroscopic state combined with random noise into the microscopic state). Among them, the authors use [[invertible neural network]] (INN) to construct the encoder (Encoder) and decoder (Decoder), which approximately correspond to the coarse-graining function [math]\phi[/math] and the inverse coarse-graining function [math]\phi^{\dagger}[/math] respectively. The reason for using [[invertible neural network]] is that we can simply invert this network to obtain the inverse coarse-graining function (i.e., [math]\phi^{\dagger}\approx \phi^{-1}[/math]). This model framework can be regarded as a neural information compressor. It puts the microscopic state data containing noise into a narrow information channel, compresses it into a macroscopic state, discards useless information, so that the causality of macroscopic dynamics is stronger, and then decodes it into a prediction of the microscopic state. The model framework of the NIS method is shown in the following figure:

第463行：第463行： −

Here [math]\psi[/math] is an invertible function implemented by an invertible neural network, [math]\chi[/math] is a projection function, that is, removing the last <math>p - q</math> dimensional components from the <math>p</math>-dimensional vector. Here <math>p,q</math> are the dimensions of the microscopic state and macroscopic state respectively. <math>\circ</math> is the composition operation of functions.

+

Here [math]\psi[/math] is an invertible function implemented by an [[invertible neural network]], [math]\chi[/math] is a [[projection function]], that is, removing the last <math>p - q</math> dimensional components from the <math>p</math>-dimensional vector. Here <math>p,q</math> are the dimensions of the microscopic state and macroscopic state respectively. <math>\circ</math> is the composition operation of functions.

第477行：第477行： −

However, if we directly optimize the dimension-averaged effective information, there will be certain difficulties. The article <ref name="NIS" /> does not directly optimize Equation {{EquationNote|1}}, but adopts a clever method. To solve this problem, the author divides the optimization process into two stages. The first stage is to minimize the microscopic state prediction error under the condition of a given macroscopic scale <math>q</math>, that is, <math>\min _{\phi, f_q, \phi^{\dagger}}\left\|\phi^{\dagger}(Y(t + 1)) - X_{t + 1}\right\|<\epsilon</math> and obtain the optimal macroscopic state dynamics [math]f_q^\ast[/math]; the second stage is to search for the hyperparameter <math>q</math> to maximize the effective information [math]\mathcal{J}[/math], that is, <math>\max_{q}\mathcal{J}(f_{q}^\ast)</math>. Practice has proved that this method can effectively find macroscopic dynamics and coarse-graining functions, but it cannot truly maximize EI in advance.

+

However, if we directly optimize the dimension-averaged [[effective information]], there will be certain difficulties. The article <ref name="NIS" /> does not directly optimize Equation {{EquationNote|1}}, but adopts a clever method. To solve this problem, the author divides the optimization process into two stages. The first stage is to minimize the microscopic state prediction error under the condition of a given macroscopic scale <math>q</math>, that is, <math>\min _{\phi, f_q, \phi^{\dagger}}\left\|\phi^{\dagger}(Y(t + 1)) - X_{t + 1}\right\|<\epsilon</math> and obtain the optimal macroscopic state dynamics [math]f_q^\ast[/math]; the second stage is to search for the hyperparameter <math>q</math> to maximize the effective information [math]\mathcal{J}[/math], that is, <math>\max_{q}\mathcal{J}(f_{q}^\ast)</math>. Practice has proved that this method can effectively find macroscopic dynamics and coarse-graining functions, but it cannot truly maximize EI in advance.

第483行：第483行： −

'''Theorem 1''': The information bottleneck of the neural information squeezer. That is, for any bijection <math>\mathrm{\psi}</math>, projection <math>\chi</math>, macroscopic dynamics <math>f</math> and Gaussian noise <math>z_{p - q}\sim\mathcal{Ν}\left (0,I_{p - q}\right )</math>,

+

'''Theorem 1''': The [[information bottleneck]] of the neural information squeezer. That is, for any bijection <math>\mathrm{\psi}</math>, projection <math>\chi</math>, macroscopic dynamics <math>f</math> and Gaussian noise <math>z_{p - q}\sim\mathcal{Ν}\left (0,I_{p - q}\right )</math>,

第503行：第503行：

======Comparison with Classical Theories======

−

The NIS framework has many similarities with the computational mechanics framework mentioned in the previous sections. NIS can be regarded as an <math>\epsilon</math>-machine. The set of all historical processes <math>\overleftarrow{S}</math> in computational mechanics can be regarded as a microscopic state. All <math>R \in \mathcal{R}</math> represent macroscopic states. The function <math>\eta</math> can be understood as a coarse-graining function. <math>\epsilon</math> can be understood as an effective coarse-graining strategy. <math>T</math> corresponds to effective macroscopic dynamics. The characteristic of minimum randomness index characterizes the determinism of macroscopic dynamics and can be replaced by effective information in causal emergence. When the entire framework is fully trained and can accurately predict the future microscopic state, the encoded macroscopic state converges to the effective state, and the effective state can be regarded as the causal state in computational mechanics.

+

The [[NIS]] framework has many similarities with the [[computational mechanics]] framework mentioned in the previous sections. NIS can be regarded as an <math>\epsilon</math>-machine. The set of all historical processes <math>\overleftarrow{S}</math> in [[computational mechanics]] can be regarded as a microscopic state. All <math>R \in \mathcal{R}</math> represent macroscopic states. The function <math>\eta</math> can be understood as a coarse-graining function. <math>\epsilon</math> can be understood as an effective coarse-graining strategy. <math>T</math> corresponds to effective macroscopic dynamics. The characteristic of minimum randomness index characterizes the determinism of macroscopic dynamics and can be replaced by [[effective information]] in causal emergence. When the entire framework is fully trained and can accurately predict the future microscopic state, the encoded macroscopic state converges to the effective state, and the effective state can be regarded as the [[causal state]] in computational mechanics.

−

At the same time, the NIS framework also has similarities with the G-emergence theory mentioned earlier. For example, NIS also adopts the idea of Granger causality: optimizing the effective macroscopic state by predicting the microscopic state at the next time step. However, there are several obvious differences between these two frameworks: a) In the G-emergence theory, the macroscopic state needs to be manually selected, while in NIS, the macroscopic state is obtained by automatically optimizing the coarse-graining strategy; b) NIS uses neural networks to predict future states, while G-emergence uses autoregressive techniques to fit the data.

+

At the same time, the [[NIS]] framework also has similarities with the G-emergence theory mentioned earlier. For example, [[NIS]] also adopts the idea of [[Granger causality]]: optimizing the effective macroscopic state by predicting the microscopic state at the next time step. However, there are several obvious differences between these two frameworks: a) In the G-emergence theory, the macroscopic state needs to be manually selected, while in [[NIS]], the macroscopic state is obtained by automatically optimizing the coarse-graining strategy; b) NIS uses neural networks to predict future states, while G-emergence uses autoregressive techniques to fit the data.

======Computational Examples======

−

The author of NIS conducted experiments in the spring oscillator model, and the results are shown in the following figure. Figure a shows that the results of encoding at the next moment linearly coincide with the iterative results of macroscopic dynamics, verifying the effectiveness of the model. Figure b shows that the learned two dynamics and the real dynamics also coincide, further verifying the effectiveness of the model. Figure c is the multi-step prediction effect of the model. The prediction and the real curve are very close. Figure d shows the magnitude of causal emergence at different scales. It is found that causal emergence is most significant when the scale is 2, corresponding to the real spring oscillator model. Only two states (position and velocity) are needed to describe the entire system.

+

The author of NIS conducted experiments in the [[spring oscillator model]], and the results are shown in the following figure. Figure a shows that the results of encoding at the next moment linearly coincide with the iterative results of macroscopic dynamics, verifying the effectiveness of the model. Figure b shows that the learned two dynamics and the real dynamics also coincide, further verifying the effectiveness of the model. Figure c is the multi-step prediction effect of the model. The prediction and the real curve are very close. Figure d shows the magnitude of causal emergence at different scales. It is found that causal emergence is most significant when the scale is 2, corresponding to the real spring oscillator model. Only two states (position and velocity) are needed to describe the entire system.

第517行：第517行：

=====NIS+=====

−

Although NIS took the lead in proposing a scheme to optimize EI to identify causal emergence in data, this method has some shortcomings: the author divides the optimization process into two stages, but does not truly maximize the effective information, that is, Equation {{EquationNote|1}}. Therefore, Yang Mingzhe et al. <ref name=":6" /> further improved this method and proposed the NIS+ scheme. By introducing reverse dynamics and reweighting technique, the original maximization of effective information is transformed into maximizing its variational lower bound by means of variational inequality to directly optimize the objective function.

+

Although NIS took the lead in proposing a scheme to optimize EI to identify causal emergence in data, this method has some shortcomings: the author divides the optimization process into two stages, but does not truly maximize the effective information, that is, Equation {{EquationNote|1}}. Therefore, [[Yang Mingzhe]] et al. <ref name=":6" /> further improved this method and proposed the [[NIS+]] scheme. By introducing reverse dynamics and [[reweighting technique]], the original maximization of effective information is transformed into maximizing its variational lower bound by means of [[variational inequality]] to directly optimize the objective function.

======Mathematical Principles======

−

Specifically, according to variational inequality and inverse probability weighting method, the constrained optimization problem given by Equation {{EquationNote|2}} can be transformed into the following unconstrained minimization problem:

+

Specifically, according to variational inequality and [[inverse probability weighting]] method, the constrained optimization problem given by Equation {{EquationNote|2}} can be transformed into the following unconstrained minimization problem:

第546行：第546行：

======Case Analysis======

−

The article conducts experiments on different time series data sets, including the data generated by the disease transmission dynamical system model SIR dynamics, the bird flock model (Boids model) and the cellular automaton: Game of Life, as well as the fMRI signal data of the brain nervous system of real human subjects. Here we choose the bird flock and brain signals for experimental introduction and description.

+

The article conducts experiments on different time series data sets, including the data generated by the disease transmission dynamical system model [[SIR dynamics]], the bird flock model ([[Boids model]]) and the cellular automaton: [[Game of Life]], as well as the fMRI signal data of the [[brain nervous system]] of real human subjects. Here we choose the bird flock and brain signals for experimental introduction and description.

−

The following figure shows the experimental results of NIS+ learning the flocking behavior of the Boids model. (a) and (e) give the actual and predicted trajectories of bird flocks under different conditions. Specifically, the author divides the bird flock into two groups and compares the multi-step prediction results under different noise levels (<math>\alpha</math> is 0.001 and 0.4 respectively). The prediction is good when the noise is relatively small, and the prediction curve will diverge when the noise is relatively large. (b) shows that the mean absolute error (MAE) of multi-step prediction gradually increases as the radius r increases. (c) shows the change of causal emergence measure <math>\Delta J</math> and prediction error (MAE) under different dimensions (q) with the change of training epoch. The author finds that causal emergence is most significant when the macroscopic state dimension q = 8. (d) is the attribution analysis of macroscopic variables to microscopic variables, and the obtained significance map intuitively describes the learned coarse-graining function. Here, each macroscopic dimension can correspond to the spatial coordinate (microscopic dimension) of each bird. The darker the color, the higher the correlation. Here, the microscopic coordinates corresponding to the maximum correlation of each macroscopic state dimension are highlighted with orange dots. These attribution significance values are obtained by using the Integrated Gradient (referred to as IG) method. The horizontal axis represents the x and y coordinates of 16 birds in the microscopic state, and the vertical axis represents 8 macroscopic dimensions. The light blue dotted line distinguishes the coordinates of different individual Boids, and the blue solid line separates the two bird flocks. (f) and (g) represent the changing trends of causal emergence measure <math>\Delta J</math> and normalized error MAE under different noise levels. (f) represents the influence of changes in external noise (that is, adding observation noise to microscopic data) on causal emergence. (g) represents the influence of internal noise (represented by <math>\alpha</math>, added by modifying the dynamics of the Boids model) on causal emergence. In (f) and (g), the horizontal line represents the threshold that violates the error constraint in Equation {{EquationNote|1}}. When the normalized MAE is greater than the threshold of 0.3, the constraint is violated and the result is unreliable.

+

The following figure shows the experimental results of NIS+ learning the flocking behavior of the Boids model. (a) and (e) give the actual and predicted trajectories of bird flocks under different conditions. Specifically, the author divides the bird flock into two groups and compares the multi-step prediction results under different noise levels (<math>\alpha</math> is 0.001 and 0.4 respectively). The prediction is good when the noise is relatively small, and the prediction curve will diverge when the noise is relatively large. (b) shows that the mean absolute error (MAE) of multi-step prediction gradually increases as the radius r increases. (c) shows the change of [[causal emergence measure]] <math>\Delta J</math> and prediction error (MAE) under different dimensions (q) with the change of training epoch. The author finds that causal emergence is most significant when the macroscopic state dimension q = 8. (d) is the attribution analysis of macroscopic variables to microscopic variables, and the obtained significance map intuitively describes the learned coarse-graining function. Here, each macroscopic dimension can correspond to the spatial coordinate (microscopic dimension) of each bird. The darker the color, the higher the correlation. Here, the microscopic coordinates corresponding to the maximum correlation of each macroscopic state dimension are highlighted with orange dots. These attribution significance values are obtained by using the [[Integrated Gradient]] (referred to as IG) method. The horizontal axis represents the x and y coordinates of 16 birds in the microscopic state, and the vertical axis represents 8 macroscopic dimensions. The light blue dotted line distinguishes the coordinates of different individual Boids, and the blue solid line separates the two bird flocks. (f) and (g) represent the changing trends of causal emergence measure <math>\Delta J</math> and normalized error MAE under different noise levels. (f) represents the influence of changes in external noise (that is, adding observation noise to microscopic data) on causal emergence. (g) represents the influence of internal noise (represented by <math>\alpha</math>, added by modifying the dynamics of the Boids model) on causal emergence. In (f) and (g), the horizontal line represents the threshold that violates the error constraint in Equation {{EquationNote|1}}. When the normalized MAE is greater than the threshold of 0.3, the constraint is violated and the result is unreliable.

−

This set of experiments shows that NIS+ can learn macroscopic states and coarse-graining strategies by maximizing EI. This maximization enhances the generalization ability of the model to situations beyond the range of training data. The learned macroscopic state effectively identifies the average group behavior and can be attributed to individual positions using the gradient integration method. In addition, the degree of causal emergence increases with the increase of external noise and decreases with the increase of internal noise. This observation result shows that the model can eliminate external noise through coarse-graining, but cannot reduce internal noise.

+

This set of experiments shows that [[NIS+]] can learn macroscopic states and coarse-graining strategies by maximizing EI. This maximization enhances the generalization ability of the model to situations beyond the range of training data. The learned macroscopic state effectively identifies the average [[group behavior]] and can be attributed to individual positions using the gradient integration method. In addition, the degree of causal emergence increases with the increase of external noise and decreases with the increase of internal noise. This observation result shows that the model can eliminate external noise through coarse-graining, but cannot reduce internal noise.

第558行：第558行： −

The brain experiment is based on real fMRI data, which is obtained by performing two sets of experiments on 830 human subjects. In the first group, the subjects were asked to perform a visual task of watching a short movie clip and the recording was completed. In the second group of experiments, they were asked to be in a resting state and the recording was completed. Due to the relatively high original dimension, the authors first reduced the original 14000-dimensional data to 100 dimensions by using the Schaefer atlas method, and each dimension corresponds to a brain region. After that, the authors learned these data through NIS+ and extracted the dynamics at six different macroscopic scales. Figure a shows the multi-step prediction error results at different scales. Figure b shows the comparison of EI of NIS and NIS+ methods on different macroscopic dimensions in the resting state and the visual task of watching movies. The authors found that in the visual task, causal emergence is most significant when the macroscopic state dimension is q = 1. Through attribution analysis, it is found that the visual area plays the largest role (Figure c), which is consistent with the real scene. Figure d shows different perspective views of brain region attribution. In the resting state, one macroscopic dimension is not enough to predict the microscopic time series data. The dimension with the largest causal emergence is between 3 and 7 dimensions.

+

The brain experiment is based on real fMRI data, which is obtained by performing two sets of experiments on 830 human subjects. In the first group, the subjects were asked to perform a visual task of watching a short movie clip and the recording was completed. In the second group of experiments, they were asked to be in a resting state and the recording was completed. Due to the relatively high original dimension, the authors first reduced the original 14000-dimensional data to 100 dimensions by using the [[Schaefer atlas]] method, and each dimension corresponds to a brain region. After that, the authors learned these data through NIS+ and extracted the dynamics at six different macroscopic scales. Figure a shows the multi-step prediction error results at different scales. Figure b shows the comparison of EI of NIS and NIS+ methods on different macroscopic dimensions in the resting state and the visual task of watching movies. The authors found that in the visual task, causal emergence is most significant when the macroscopic state dimension is q = 1. Through attribution analysis, it is found that the visual area plays the largest role (Figure c), which is consistent with the real scene. Figure d shows different perspective views of brain region attribution. In the resting state, one macroscopic dimension is not enough to predict the microscopic time series data. The dimension with the largest causal emergence is between 3 and 7 dimensions.

第564行：第564行： −

These experiments show that NIS+ can not only identify causal emergence in data, discover emergent macroscopic dynamics and coarse-graining strategies, but also other experiments show that the NIS+ model can also increase the out-of-distribution generalization ability of the model through EI maximization.

+

These experiments show that NIS+ can not only identify causal emergence in data, discover emergent macroscopic dynamics and coarse-graining strategies, but also other experiments show that the [[NIS+]] model can also increase the out-of-distribution generalization ability of the model through EI maximization.

−

==Applications==

Complexivist Ran

150

个编辑