更改

Effective Information (查看源代码)

2024年9月10日 (二) 11:19的版本

添加183字节、 2024年9月10日 (星期二)

无编辑摘要

第375行：第375行：

</math>

|{{EquationRef|3}}}}

−

Here, pij denotes the conditional probability of transitioning from state i to state j in matrix P. Since each row of P is subject to the normalization constraint, the EI function has N(N−1) degrees of freedom. We can select 1≤i≤N and 1≤j≤N−1, with piN representing the conditional probability in the i-th row and N-th column. pˉ⋅j and pˉ⋅N represent the average conditional probabilities of the j-th and N-th columns, respectively.

第692行：第690行：

其中<math>\epsilon</math>和<math>\delta</math>分别表示观测噪音和干预噪音的大小。-->This kind of derivation was first seen in Hoel's 2013 paper [1] and was further discussed in detail in the "Neural Information Squeezer" paper [2].

−

===~~高维情况~~===

+

===High-Dimensional Case===

−

~~我们可以把上述对一维变量的EI计算推广到更一般的n维情景。即：~~{{NumBlk|:|

+

We can extend the EI (Effective Information) calculation for one-dimensional variables to a more general n-dimensional scenario. Specifically:{{NumBlk|:|

<math>

\mathbf{y}=f(\mathbf{x})+\xi,

</math>

−

|{{EquationRef|5}}}}~~其中，~~[math]\xi\sim \mathcal{N}(0,\Sigma)[/math]，<math>\Sigma</math>~~是高斯噪声~~<math>\xi</math>~~的协方差矩阵。首先，我们将~~[math]\mathbf{x}[/math]~~干预成~~<math>[-L/2，L/2]^n\subset\mathcal{R}^n</math>~~上的均匀分布，~~<math>[-L/2，L/2]^n</math>~~表示n维空间中的超立方体，我们假设~~<math>\mathbf{y}\in\mathcal{R}^m</math>~~，其中~~<math>n</math>和<math>m</math>~~是正整数。只存在观测噪声的情况下，EI可以推广为以下形式：~~{{NumBlk|:|

+

|{{EquationRef|5}}}}Let [math]\xi\sim \mathcal{N}(0,\Sigma)[/math], where <math>\Sigma</math> is the covariance matrix of the Gaussian noise <math>\xi</math>. First, we intervene on [math]\mathbf{x}[/math] such that it follows a uniform distribution over <math>[-L/2，L/2]^n\subset\mathcal{R}^n</math>, where <math>[-L/2，L/2]^n</math> represents a hypercube in n-dimensional space. We assume <math>\mathbf{y}\in\mathcal{R}^m</math>, where <math>n</math> and <math>m</math> are positive integers. In the presence of observational noise only, the EI can be generalized as follows:{{NumBlk|:|

<math>EI\approx \ln\left(\frac{L^n}{(2\pi e)^{m/2}}\right)+\frac{1}{L^n}\int_{-[\frac{L}{2},\frac{L}{2}]^n}\ln\left|\det\left(\frac{\partial_\mathbf{x} f(\mathbf{x})}{\Sigma^{1/2}}\right)\right| d\mathbf{x},

</math>

−

|{{EquationRef|6}}}}~~其中，~~<math>|\cdot|</math>~~是绝对值运算，~~<math>\det</math>~~是行列式。~~<!--为了将信息几何推广到具有干预噪声和观测噪声的情况，需要引入一个新的维度为<math>l</math>的中间变量<math>\theta\subset\mathcal{R}^l</math>，使得我们不能通过直接干预<math>\mathbf{x}</math>来控制<math>\mathbf{y}</math>。相反，我们可以干预<math>\mathbf{x}</math>以影响<math>\theta</math>并间接影响<math>\mathbf{y}</math>。因此，这三个变量形成了一个马尔可夫链：<math>\mathbf{x}\to\theta\to\mathbf{y}</math>。

+

|{{EquationRef|6}}}}Where <math>|\cdot|</math> denotes the absolute value, and <math>\det</math> refers to the determinant.<!--为了将信息几何推广到具有干预噪声和观测噪声的情况，需要引入一个新的维度为<math>l</math>的中间变量<math>\theta\subset\mathcal{R}^l</math>，使得我们不能通过直接干预<math>\mathbf{x}</math>来控制<math>\mathbf{y}</math>。相反，我们可以干预<math>\mathbf{x}</math>以影响<math>\theta</math>并间接影响<math>\mathbf{y}</math>。因此，这三个变量形成了一个马尔可夫链：<math>\mathbf{x}\to\theta\to\mathbf{y}</math>。

在这种情况下，可以获得两个流形：效应流形<math>\mathcal{M}_E=\{p(\mathbf{y}|\theta)\}_{\theta}</math>，度量为<math>g_{\mu\nu}=-\mathbb{E}_{p(\mathbf{y}|\theta)}\partial_{\mu}\partial_{\nu}\ln p(\mathbf{y}|\theta)</math>；干预流形<math>\mathcal{M}_I=\{\tilde{q}(\mathbf{x}|\theta)\}_{\theta\in \Theta}</math>，度量为<math>h_{\mu\nu}=-\mathbb{E}_{\tilde{q}(\mathbf{x}|\theta)}\partial_{\mu}\partial_{\nu}\ln \tilde{q}(\mathbf{x}|\theta)</math>。其中<math>\tilde{q}\equiv \frac{q(\theta|\mathbf{x})}{\int q(\theta|\mathbf{x})d\mathbf{x}}</math>，<math>\partial_{\mu}=\partial/\partial \theta_{\mu}</math>。效应和干预两个流形合在一起称为因果几何。

第709行：第707行：

EI_g=\ln\frac{V_I}{(2\pi e)^{n/2}}-\frac{1}{2V_I}\int_\Theta\sqrt{|\det(h_{\mu\nu})|} \ln\left|\det\left( I_n+\frac{h_{\mu\nu}}{g_{\mu\nu}}\right)\right|d^l\theta,

</math>-->

−

==~~维度平均的EI~~==

+

==Dimension-Averaged EI==

−

~~在离散状态的系统中，当我们比较不同尺度系统的时候，可以直接计算EI的差异也可以计算归一化的EI差异。归一化的EI是除以~~[math]\log N[/math]~~，这里~~[math]N=\#(\mathcal{X})[/math]~~为离散状态空间~~[math]\mathcal{X}[/math]~~中的元素个数。~~

+

In discrete-state systems, when comparing systems of different scales, we can compute either the direct EI difference or the normalized EI difference. Normalized EI is divided by [math]\log N[/math], where [math]N=#(\mathcal{X})[/math] represents the number of elements in the discrete state space [math]\mathcal{X}[/math].

−

~~然而，在针对连续变量进行扩展的时候，如果使用原始的EI，那么就会出现不合理的情况。首先，如公式{{EquationNote|~~6~~}}所示，EI的计算公式中包含着~~[math]\ln L^n[/math]项。由于L为一个很大的正数，因而EI的计算结果将会受到L的严重影响。其次，如果计算归一化的EI，即Eff，那么会遇到一个问题是，对于连续变量来说，其状态空间的元素个数为无穷多个，如果直接使用，势必会引入无穷大量。一种解决办法是将空间的体积视作个数N，因此应该除以归一化变量为：[math]n \ln L[/math]~~，由此可见它是正比于n和ln L的，即：~~

+

However, for continuous variables, if the original EI is used, an unreasonable result may occur. Firstly, as shown in equation (6), the EI formula contains a term [math]\ln L^n[/math]. Since L is a large positive number, the EI result will be significantly affected by L. Secondly, when calculating normalized EI (Eff), the issue arises that for continuous variables, the number of elements in the state space is infinite. A potential solution is to treat the volume of the space as the number N, and thus normalize it by [math]n \ln L[/math], meaning it is proportional to n and ln L:

<math>

第718行：第716行：

</math>

−

~~然而，在这个式子中，仍然包含着L项，因而也会对Eff造成很大的影响。并且，当我们比较微观（n维）和宏观（m维，且m~~<~~n）两个维度的Eff时，即计算归一化的因果涌现的时候，L并不能消掉。~~

+

However, this still contains the L term, affecting Eff significantly. Moreover, when comparing microscopic (n-dimensional) and macroscopic (m-dimensional, where m < n) Eff, the L term cannot be eliminated. This suggests that the normalization issue in continuous variable systems cannot simply be transferred from the discrete case.

−

~~看来，连续变量系统的归一化问题并不能简单平移离散变量的结果。~~

−

~~在[[神经信息压缩器]]（Neural information squeezer, NIS）的框架被提出时<ref name="zhang_nis">{{cite journal|title=~~Neural Information Squeezer for ~~Causal Emergence|first1=Jiang|last1=Zhang|first2=Kaiwei|last2=Liu|journal=Entropy|year=2022|volume=25|issue=1|page=26|url=https://api~~.semanticscholar.org/CorpusID:246275672}}</ref>，作者们发明了另一种对连续变量的有效信息进行归一化方式，即用状态空间维数来归一化EI，从而解决连续状态变量上的EI比较问题，这一指标被称为'''维度平均的有效信息'''（Dimension Averaged Effective ~~Information，简称dEI）。其描述为：~~

+

When the Neural Information Squeezer (NIS) framework was proposed, the authors invented another way to normalize EI for continuous variables using state space dimensions. This approach solves the EI comparison problem for continuous state variables, introducing the Dimension-Averaged Effective Information (dEI):

<math>

第728行：第724行：

</math>

−

~~这里，[math]~~n[/math]为状态空间的维度。可以证明，在离散的状态空间中，'''维度平均的EI'''和'''有效性'''指标（Eff）实际上是等价的。关于连续变量上的EI，我们将在下文进一步详述。

+

Where n is the dimension of the state space. It can be proven that in the discrete state space, dimension-averaged EI and Eff are actually equivalent. The EI for continuous variables will be further discussed below.

−

~~对于n维迭代动力系统来说，首先，~~[math]\mathbf{y}[/math]和[math]\mathbf{x}[/math]~~是同一维度的变量，因此~~[math]m=n[/math]~~，因而：将公式{{EquationNote|~~6~~}}代入维度平均EI，得到：~~

+

For n-dimensional iterative dynamical systems, [math]\mathbf{y}[/math] and [math]\mathbf{x}[/math] are variables of the same dimension, meaning [math]m=n[/math]. Substituting equation (6) into the dimension-averaged EI gives:

<math>

第736行：第732行：

</math>

−

~~虽然L仍然没有消失，但是当我们计算'''维度平均的因果涌现'''的时候，即假设我们可以将n维的状态变量~~[math]\mathbf{x}_t[/math]~~投影到一个N维的宏观态变量~~[math]\mathbf{X}_t[/math]~~，以及相对应的宏观动力学(~~F~~)，和噪声的协方差~~[math]\Sigma'[/math]~~则宏观动力学的维度平均EI与微观动力学的维度平均EI之差为：~~

+

Although L has not disappeared, when calculating the dimension-averaged causal emergence—where the n-dimensional state variable [math]\mathbf{x}_t[/math] is projected onto an N-dimensional macroscopic state variable [math]\mathbf{X}_t[/math]—and the corresponding macrodynamics F and noise covariance [math]\Sigma'[/math], the difference between the macrodynamics and microdynamics' dimension-averaged EI is:

<math>

第742行：第738行：

</math>

−

~~注意，上式中的积分可以写成均匀分布下的期望，即~~[math]\int_{[-\frac{L}{2},\frac{L}{2}]}\frac{1}{L^n}\cdot=\mathbb{E}_{\mathbf{x}\sim \mathcal{U}[-\frac{L}{2},\frac{L}{2}]^n}\cdot[/math]~~，继而上式化为：~~

+

Note that the integral in the above equation can be written as the expectation under a uniform distribution, i.e. [math]\int_{[-\frac{L}{2},\frac{L}{2}]}\frac{1}{L^n}\cdot=\mathbb{E}_{\mathbf{x}\sim \mathcal{U}[-\frac{L}{2},\frac{L}{2}]^n}\cdot[/math], then the above equation is transformed into:<math>

−

<math>

\Delta \mathcal{J}\approx \frac{1}{N}\mathbb{E}_{X\sim\mathcal{U}[-\frac{L}{2},\frac{L}{2}]^N}\ln \left|\det\left(\frac{\partial_\mathbf{X} F(\mathbf{X})}{\Sigma'^{1/2}}\right)\right| - \frac{1}{n}\mathbb{E}_{\mathbf{x}\sim\mathcal{U}[-\frac{L}{2},\frac{L}{2}]^n}\ln\left|\det\left(\frac{\partial_\mathbf{x} f(\mathbf{x})}{\Sigma^{1/2}}\right)\right|

</math>

−

这里[math]\mathcal{U}([-\frac{L}{2},\frac{L}{2}]^n)[/math]~~表示立方体~~[math][-\frac{L}{2},\frac{L}{2}]^n[/math]~~上的均匀分布。由此可见，尽管在期望中仍然隐含地包含L，但该式中所有显含L的项就都被消失了。在实际数值计算的时候，期望的计算可以表现为在~~[math][-\frac{L}{2},\frac{L}{2}]^n[/math]~~上多个采样取平均，因而也是与L的大小无关的。这就展示出来引入维度平均EI的一定的合理性。~~

+

Here [math]\mathcal{U}([-\frac{L}{2},\frac{L}{2}]^n)[/math] represents the uniform distribution on the cube [math][-\frac{L}{2},\frac{L}{2}]^n[/math]. From this, it can be seen that although L is still implicitly included in the expectation, all terms in the equation that explicitly contain L are eliminated. In actual numerical calculations, the expected calculation can be manifested as taking the average of multiple samples on [math][-\frac{L}{2},\frac{L}{2}]^n[/math], and therefore is also independent of the size of L. This demonstrates the rationality of introducing dimension averaged EI.

+

==随机迭代系统==

我们可以把上述结论，推广到线性迭代动力系统中，也就是对于形如

第799行：第793行：

其中，[math]W[/math]为粗粒化矩阵，它的阶数为n*m，m为宏观状态空间的维度，它的作用是把任意的微观态[math]x_t[/math]映射为宏观态[math]y_t[/math]。[math]W^{\dagger}[/math]为W的伪逆运算。式中第一项是由确定性引发的涌现，简称'''确定性涌现'''(Determinism Emergence)，第二项为简并性引发的涌现，简称'''简并性涌现'''。更详细的内容参看[[随机迭代系统的因果涌现]]。

==前馈神经网络==

−

针对复杂系统自动建模任务，我们往往使用神经网络来建模系统动力学。具体的，对于前馈神经网络来说，[[张江]]等人推导出了前馈神经网络有效信息的计算公式<ref name="zhang_nis" />，其中神经网络的输入是<math>x(x_1,...,x_n)</math>，输出是<math>y(y_1,...,y_n)</math>，并且满足<math>y=f(x)</math>，<math>f</math>是由神经网络实现的确定性映射。然而，根据公式{{EquationNote|5}}，映射中必须包含噪声才能够体现不确定性。

+

针对复杂系统自动建模任务，我们往往使用神经网络来建模系统动力学。具体的，对于前馈神经网络来说，[[张江]]等人推导出了前馈神经网络有效信息的计算公式<ref name="zhang_nis">{{cite journal|title=Neural Information Squeezer for Causal Emergence|first1=Jiang|last1=Zhang|first2=Kaiwei|last2=Liu|journal=Entropy|year=2022|volume=25|issue=1|page=26|url=https://api.semanticscholar.org/CorpusID:246275672}}</ref>，其中神经网络的输入是<math>x(x_1,...,x_n)</math>，输出是<math>y(y_1,...,y_n)</math>，并且满足<math>y=f(x)</math>，<math>f</math>是由神经网络实现的确定性映射。然而，根据公式{{EquationNote|5}}，映射中必须包含噪声才能够体现不确定性。

因而，在神经网络中，我们假设神经网络从输入到输出的计算也是不确定性的，即也符合公式{{EquationNote|5}}：

相信未来

2,435

个编辑

更改

Effective Information (查看源代码)

2024年9月10日 (二) 11:19的版本

导航菜单

搜索