更改

添加602字节 、 2024年9月10日 (星期二)
无编辑摘要
第118行: 第118行:  
According to Judea Pearl’s ladder of causality, causal inference can be divided into three levels: association, intervention, and counterfactuals. The higher the level, the stronger the causal features. Directly estimating mutual information from observational data measures the level of association. If we can intervene in the variables, i.e., set a variable to a specific value or make it follow a particular distribution, we move up to the intervention level. By introducing the do-operator in the definition of EI, we allow EI to capture causal features more effectively than mutual information alone.
 
According to Judea Pearl’s ladder of causality, causal inference can be divided into three levels: association, intervention, and counterfactuals. The higher the level, the stronger the causal features. Directly estimating mutual information from observational data measures the level of association. If we can intervene in the variables, i.e., set a variable to a specific value or make it follow a particular distribution, we move up to the intervention level. By introducing the do-operator in the definition of EI, we allow EI to capture causal features more effectively than mutual information alone.
   −
From a practical perspective, incorporating the do-operator in EI’s calculation separates the data from the dynamics, eliminating the effect of the data distribution (i.e., the distribution of X) on the EI measurement. In causal graphs, the do-operator cuts off all causal arrows pointing to the intervened variable, preventing confounding factors from creating spurious associations. Similarly, in EI’s definition, the do-operator removes all causal arrows pointing to the cause variable X, including influences from other variables (both observable and unobservable). This ensures that EI captures the intrinsic characteristics of the dynamics itself.
+
From a practical perspective, incorporating the do-operator in EI’s calculation separates the data from the dynamics, eliminating the effect of the data distribution (i.e., the distribution of X) on the EI measurement. In causal graphs, the do-operator cuts off all causal arrows pointing to the intervened variable, preventing confounding factors from creating spurious associations. Similarly, in EI's definition, the do-operator removes all causal arrows pointing to the cause variable X, including influences from other variables (both observable and unobservable). This ensures that EI captures the intrinsic characteristics of the dynamics itself.
    
The introduction of the do-operator makes EI distinct from other information metrics. The key difference is that EI is solely a function of the causal mechanism, which allows it to more precisely capture the essence of causality compared to other metrics like transfer entropy. However, this also means that EI requires knowledge of or access to the causal mechanism, which may be challenging if only observational data is available.
 
The introduction of the do-operator makes EI distinct from other information metrics. The key difference is that EI is solely a function of the causal mechanism, which allows it to more precisely capture the essence of causality compared to other metrics like transfer entropy. However, this also means that EI requires knowledge of or access to the causal mechanism, which may be challenging if only observational data is available.
第537行: 第537行:  
|[math]\begin{aligned}&Det(P_m)=0.81\ bits,\\&Deg(P_m)=0\ bits,\\&EI(P_m)=0.81\ bits\end{aligned}[/math]||[math]\begin{aligned}&Det(P_M)=1\ bits,\\&Deg(P_M)=0\ bits,\\&EI(P_M)=1\ bits\end{aligned}[/math]
 
|[math]\begin{aligned}&Det(P_m)=0.81\ bits,\\&Deg(P_m)=0\ bits,\\&EI(P_m)=0.81\ bits\end{aligned}[/math]||[math]\begin{aligned}&Det(P_M)=1\ bits,\\&Deg(P_M)=0\ bits,\\&EI(P_M)=1\ bits\end{aligned}[/math]
 
|}
 
|}
 +
      第794行: 第795行:     
==Feedforward Neural Networks==
 
==Feedforward Neural Networks==
针对复杂系统自动建模任务,我们往往使用神经网络来建模系统动力学。具体的,对于前馈神经网络来说,[[张江]]等人推导出了前馈神经网络有效信息的计算公式<ref name="zhang_nis">{{cite journal|title=Neural Information Squeezer for Causal Emergence|first1=Jiang|last1=Zhang|first2=Kaiwei|last2=Liu|journal=Entropy|year=2022|volume=25|issue=1|page=26|url=https://api.semanticscholar.org/CorpusID:246275672}}</ref>,其中神经网络的输入是<math>x(x_1,...,x_n)</math>,输出是<math>y(y_1,...,y_n)</math>,并且满足<math>y=f(x)</math><math>f</math>是由神经网络实现的确定性映射。然而,根据公式{{EquationNote|5}},映射中必须包含噪声才能够体现不确定性。
+
For modeling complex systems, neural networks are often used. Specifically, for feedforward neural networks, Zhang Jiang et al. derived the formula for calculating the EI of such networks. The input of the neural network is <math>x(x_1,...,x_n)</math>, the output is <math>y(y_1,...,y_n)</math>, and it satisfies <math>y=f(x)</math>. <math>f</math> is a deterministic mapping implemented by the neural network. However, according to formula 5, noise must be included in the mapping to reflect uncertainty.
   −
因而,在神经网络中,我们假设神经网络从输入到输出的计算也是不确定性的,即也符合公式{{EquationNote|5}}:
+
Therefore, in neural networks, we assume that the computation from input to output is also uncertain, which also conforms to formula 5:
    
<math>
 
<math>
第802行: 第803行:  
</math>
 
</math>
   −
这里,[math]\xi\sim \mathcal{N}(0,\Sigma)[/math]为一高斯噪声,且[math]\Sigma=\mathrm{diag}(\sigma_1,\sigma_2,\cdots,\sigma_n)[/math],这里[math]\sigma_i[/math]为第i个维度的最小平方误差(Mean Square Error,简称MSE误差)。也就是说,我们假设从x到y的神经网络映射实际上满足一个均值为[math]f(x)[/math],协方差为[math]\Sigma[/math]的条件高斯分布,即:
+
Here, [math]\xi\sim \mathcal{N}(0,\Sigma)[/math] is a Gaussian noise, and [math]\Sigma=\mathrm{diag}(\sigma_1,\sigma_2,\cdots,\sigma_n)[/math], where [math]\sigma_i[/math] is the Mean Square Error (MSE) of the i-th dimension. That is to say, we assume that the neural network mapping from x to y actually satisfies a conditional Gaussian distribution with a mean of [math]f(x)[/math] and a covariance of [math]\Sigma[/math], that is:
    
<math>
 
<math>
第808行: 第809行:  
</math>
 
</math>
   −
由此,套用高维映射一般情况下的结论,我们可以给出神经网络有效信息的一般计算公式:
+
Therefore, applying the general conclusion of high-dimensional mapping, we can provide a general formula for calculating the effective information of neural networks:
    
<math>\begin{gathered}EI(f)=I(do(x\sim \mathcal{U}([-\frac{L}{2},\frac{L}{2}]^n));y)\approx-\frac{n+n\ln(2\pi)+\sum_{i=1}^n\ln\sigma_i^2}2+n\ln(2L)+\mathbb{E}_{x\sim \mathcal{U}([-\frac{L}{2},\frac{L}{2}]^n)}(\ln|\det(\partial_{x}f(x)))|)\end{gathered} </math>
 
<math>\begin{gathered}EI(f)=I(do(x\sim \mathcal{U}([-\frac{L}{2},\frac{L}{2}]^n));y)\approx-\frac{n+n\ln(2\pi)+\sum_{i=1}^n\ln\sigma_i^2}2+n\ln(2L)+\mathbb{E}_{x\sim \mathcal{U}([-\frac{L}{2},\frac{L}{2}]^n)}(\ln|\det(\partial_{x}f(x)))|)\end{gathered} </math>
   −
其中<math>\mathcal{U}\left(\left[-L/2, L/2\right]^n\right) </math>表示范围在<math>\left[-L/2 ,L/2\right]^n</math>上的<math>n </math>维均匀分布,<math>\det </math>表示行列式。维度平均EI为:
+
Among them, <math>\mathcal{U}\left(\left[-L/2, L/2\right]^n\right) </math> represents the uniform distribution of <math>n </math> dimensions over <math>\left[-L/2 ,L/2\right]^n</math>, and <math>\det </math> represents the determinant. The average EI of dimensions is:
    
<math>
 
<math>
第820行: 第821行:  
</math>
 
</math>
   −
如果设微观的动力学可以用神经网络f来拟合,宏观动力学则可以用F来拟合,则因果涌现可以由下式计算:
+
If the micro dynamics can be fitted with a neural network f and the macro dynamics can be fitted with F, then causal emergence can be calculated by the following equation:<math>
 
  −
 
  −
<math>
   
\begin{gathered}
 
\begin{gathered}
 
\mathcal{\Delta J}\equiv \frac{EI(F)}{m}-\frac{EI(f)}{n}\approx \frac{\sum_{i=1}^n\ln\sigma_i}{n}-\frac{\sum_{i=1}^m\ln\sigma'_i}{m}+\frac{1}{m}\cdot\mathbb{E}_{X\sim \mathcal{U}([-\frac{L}{2},\frac{L}{2}]^n)}(\ln|\det(\partial_{X}F(X)))|)-\frac{1}{n}\cdot\mathbb{E}_{x\sim \mathcal{U}([-\frac{L}{2},\frac{L}{2}]^m)}(\ln|\det(\partial_{x}f(x)))|)
 
\mathcal{\Delta J}\equiv \frac{EI(F)}{m}-\frac{EI(f)}{n}\approx \frac{\sum_{i=1}^n\ln\sigma_i}{n}-\frac{\sum_{i=1}^m\ln\sigma'_i}{m}+\frac{1}{m}\cdot\mathbb{E}_{X\sim \mathcal{U}([-\frac{L}{2},\frac{L}{2}]^n)}(\ln|\det(\partial_{X}F(X)))|)-\frac{1}{n}\cdot\mathbb{E}_{x\sim \mathcal{U}([-\frac{L}{2},\frac{L}{2}]^m)}(\ln|\det(\partial_{x}f(x)))|)
第832行: 第830行:     
注意,上述结论都要求:<math>\partial_{x}f(x)</math>不为0,而对于所有的<math>x</math>,<math>\partial_{x}f(x)</math>处处为0时,我们得到: <math>\begin{gathered}EI(f)\approx\end{gathered}0</math>。对于更一般的情形,则需要考虑矩阵不满秩的情况,请参考[[神经网络的有效信息]]。
 
注意,上述结论都要求:<math>\partial_{x}f(x)</math>不为0,而对于所有的<math>x</math>,<math>\partial_{x}f(x)</math>处处为0时,我们得到: <math>\begin{gathered}EI(f)\approx\end{gathered}0</math>。对于更一般的情形,则需要考虑矩阵不满秩的情况,请参考[[神经网络的有效信息]]。
 +
 +
Among them, [math]m[/math] is the macroscopic state dimension, and [math]\sigma'_i[/math] is the mean squared error (MSE) of the i-th macroscopic dimension, which can be calculated by the gradient of macroscopic state [math]X_i[/math] under the backpropagation algorithm.
 +
 +
Note that the above conclusions all require: <math>\partial_{x}f(x)</math> is not 0, but for all [math] \ display style x [/math], [math] \ display style partial_ {x}f (x) When [/math] is 0 everywhere, we obtain: [math] \ display style {\ start {gathered}EI (f)\approx\end {gathered}0 }[/math]. For more general cases, it is necessary to consider the situation where the matrix is not of rank, please refer to the effective information of neural networks.
 +
 
==连续系统EI的源代码==
 
==连续系统EI的源代码==
 
可逆神经网络下的数值解以及随机迭代系统的解析解都可以给出计算方法。
 
可逆神经网络下的数值解以及随机迭代系统的解析解都可以给出计算方法。
2,435

个编辑