更改

基于可逆性的因果涌现理论 (查看源代码)

2024年9月1日 (日) 16:49的版本

删除6,395字节、 2024年9月1日 (星期日)

无编辑摘要

第399行：第399行：

\gamma

</math>大幅上升。这表明在粗粒化过程中损失了大量信息，同时可以得到一个相对更有效的小型网络模型，具有更强的归一化近似动态可逆性。

−

~~=附录=~~

−

~~'''引理1：'''对于一个概率转移矩阵TPM <math>P=(P_{1}，P_{2}，...,P_{N})^{T}</math>，其中<math>P_{i}</math>是第i个行向量，那么：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~P_{i}\cdot P_{j}\le 1, \forall i,j\in [1,N]~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~'''证明:''' 由于Pi是概率分布，因此它应满足归一化条件，可表示为：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~|P_{i}|_{1}=\sum_{j=1}^{N} p_{ij}=1~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~其中<math>~~

−

~~|\cdot|_{1}~~

−

~~</math>是矢量的一阶范数，它被定义为所有元素的绝对值的总和。因此：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~P_{i}\cdot P_{j}=|P_{i}\cdot P_{j}|_{1}\le \cdot |P_{i}|_{1}\cdot |P_{j}|_{1}=1~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~'''引理2：'''对于TPM P，我们可以用如下形式书写：~~

−

~~<blockquote>~~

−

~~<math>P=(P_{1},P_{2},...,P_{N})^{T}~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~其中Pi是第i个行向量。然后假设P的奇异值为<math>~~

−

~~(\sigma_{1}\ge\sigma_{2}\ge...\ge\sigma_{N}\ge0)~~

−

~~</math>，那么，若：<math>~~

−

~~P_{i}\cdot P_{i}=1, \forall i\in [1,N],~~

−

~~</math>~~

−

~~那么P的所有奇异值满足：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~\sigma_{1}\ge\sigma_{2}\ge...\ge\sigma_{r}\ge 1~~

−

~~</math>~~

−

以及

−

~~<math>~~

−

~~\sigma_{r+1}=\sigma_{r+2}=...=\sigma_{N}=0~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~其中r是矩阵P的秩。~~

−

~~'''证明:'''如果<math>~~

−

~~P_{i}\cdot P_{i}=1~~

−

~~</math>，则<math>~~

−

~~\sum_{j} p^{2}_{ij}=1~~

−

~~</math>，这意味着Pi是模为1的单位向量。而对于所有i<math>~~

−

~~|P_{i}|_{1}=\sum_{j=1}^{N} p_{ij}=1~~

−

~~</math>，因此Pi必须为独热向量。因此，P存在两种情况：1.存在<math>1≤i,j≤ N</math>，使得<math>Pi=Pj</math>；2.对于任意<math>1≤ i,j≤ N，Pi\neq Pj</math>。~~

−

~~情况一：若<math>P_{i}=P_{j}</math>且因为它们都为独热向量，所以<math>~~

−

~~P_{i}\cdot P_{j}=1~~

−

~~</math>以及<math>~~

−

~~P_{j}\cdot P_{i}=1~~

−

~~</math>（出于内积的对称性）。因此，第i行第j列的元素和第j行第i列的元素均为1，其他均为0。注意到，根据条件，<math>~~

−

~~P_{i}\cdot P_{i}=1~~

−

~~</math>同时成立，因此<math>~~

−

~~P\cdot P^{T}=1~~

−

~~</math>的矩阵具有以下形式：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~P\cdot P^{T}=\begin{matrix}~~

−

~~\\ \\i\\ \\j\\~~

−

~~\end{matrix}\begin{pmatrix}~~

−

~~1 & \cdots &\cdots &\cdots &\cdots &\cdots \\~~

−

~~\cdots & \ddots & \cdots &\cdots &\cdots &\cdots \\~~

−

~~\cdots &\cdots &1 &\cdots &1 &\cdots\\~~

−

~~\cdots &\cdots &\cdots &\ddots &\cdots &\cdots \\~~

−

~~\cdots &\cdots &1 &\cdots &1&\cdots\\~~

−

~~\cdots &\cdots &\cdots &\cdots &\cdots &\cdots~~

−

~~\end{pmatrix},~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~其中，除了对角线元素和i、j和j、i处的元素外，所有元素均为0。从公式 A37 中，我们知道第 i 行与第 j 行相同。因此，<math>~~

−

~~P\cdot P^{T}~~

−

~~</math>的最小特征值（也是P的最后一个奇异值）必须为0。如果有多个（比如N − r）(i, j)对且<math>i\neq j</math>满足<math>~~

−

~~P_{i}\cdot P_{j}=1~~

−

~~</math>，那么最后N−r个奇异值全为零。P的秩为r。所以：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~\sigma_{r+1}=\sigma_{r+2}=...=\sigma_{N}=0~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~在这种情况下，<math>~~

−

~~P\cdot P^{T}~~

−

~~</math>可以被写作：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~P\cdot P^{T}=\begin{pmatrix}~~

−

~~I_{(r-1)\times (r-1)},& \Omicron_{(r-1)\times (N-r+1)}\\~~

−

~~\Omicron_{(N-r+1)\times (r-1)}, & \mathbb{I}_{(N-r+1)\times (N-r+1)}~~

−

~~\end{pmatrix},~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~其中<math>~~

−

~~I_{(r-1)\times (r-1)}~~

−

~~</math>是大小为(r − 1) × (r − 1)的单位矩阵，<math>~~

−

~~\Omicron~~

−

~~</math>和<math>~~

−

~~\mathbb{I}~~

−

~~</math>是所有元素分别均为0和1的矩阵。因此，P的奇异值按降序排列为：~~

−

~~<blockquote>~~

−

~~<math>\sigma=(\sqrt{N-r+1},1,1,...,1,0,0,...,0)~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~因此：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~\sigma_{1}\ge\sigma_{2}\ge...\ge\sigma_{r}\ge 1~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~情况2：如果<math>~~

−

~~P_{i}\cdot P_{j}=1~~

−

~~</math> 仅在 i = j 时成立，则非零元素都位于<math>~~

−

~~P\cdot P^{T}~~

−

~~</math>的对角线上，因此<math>~~

−

~~P\cdot P^{T}=I~~

−

~~</math>。此时<math>~~

−

~~P^{T}=P^{-1}~~

−

~~</math>，且 P 必须是置换矩阵，并且所有奇异值都是1。这也符合引理2的陈述。~~

−

~~'''引理3：'''对于给定的TPM P ，任何<math>\alpha\in (0, 2) </math>的动态可逆性<math>\Gamma_{\alpha}</math>的度量小于或等于系统的大小 N 。~~

−

~~'''证明：'''因为<math>0\le\alpha\le 2</math>，所以<math>f(x)=x^{\alpha /2}</math>是凹函数，根据命题 4，我们有：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~\begin{align}~~

−

~~\Gamma_{\alpha}=\sum_{i=1}^{N} \sigma_{i}^{\alpha}=N\cdot (\frac{\sum_{i=1}^{N} \sigma_{i}^{2}}{N})^{\alpha/2}=\nonumber\\~~

−

~~N^{1-\alpha/2} (\sum_{i=1}^{N} P_{i}^{2})^{\alpha/2}\le N^{1-\frac{\alpha}{2}+\frac{\alpha}{2}}=N~~

−

~~\end{align}~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~'''引理4：'''当且仅当任何i的Pi行向量相同时，非零TPM P的秩才能达到最小值1。在这种情况下：~~

−

~~<math>\Gamma_{\alpha}=|P_{1}|^{\alpha}\cdot N_{\alpha/2}~~

−

~~</math>~~

−

~~'''证明：'''如果P的秩为1，则Pi的所有N-1个行向量，∀i∈[1, N]都可以用第一个行向量P1的线性函数来表示，因此~~

−

~~<blockquote>~~

−

~~<math>Pi = k·P1</math>~~

−

~~</blockquote>~~

−

~~其中 k > 0。~~

−

~~但是，因为<math>|Pi|_{1} = 1</math>，因此 k 必须为 1。因此：~~

−

~~<blockquote>~~

−

~~<math>Pi = Pj,\forall i, j\in [1,N]</math>~~

−

~~</blockquote>~~

−

~~另一方面，如果等式A51成立，则P的秩应该为1。在这种情况下，<math>~~

−

~~P\cdot P^{T}=|P_{1}|^{2}\cdot \mathbb{I}_{N\times N},~~

−

~~</math>， (A52) 因此，<math>~~

−

~~P\cdot P^{T}~~

−

~~</math>的特征值为<math>(|P_{1}|\cdot \sqrt{N},0,...,0)</math>。这直接导致了公式A49。~~

−

~~'''引理5：'''对于任何<math>x_{i}\ge 0,\forall i \in [1,N]</math>且<math>\alpha>0</math>,<math>f(\alpha)=(\sum_{i=1}^{N} x_{i}^{\alpha})^{1/\alpha}~~

−

~~</math>是关于<math>\alpha</math>的单调递减函数。~~

−

~~'''证明：'''因为：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~\sum_{i=1}^{N}(x_{i}^{\alpha})^{2}\le (\sum_{i=1}^{N} x_{i}^{\alpha})^{2},~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~因此：~~

−

~~<blockquote>~~

−

~~<math>log\frac{\sum_{i=1}^{N} x_{i}^{2\alpha}}{\sum_{i=1}^{N} x_{i}^{\alpha}}\le log\sum_{i=1}^{N} x_{i}^{\alpha}.</math>~~

−

~~</blockquote>~~

−

~~更进一步，由于log是凹函数，因此：~~

−

~~<blockquote>~~

−

~~<math>~~

−

~~\sum_{i=1}^{N}(\frac{x_{i}^{\alpha}}{\sum_{j=1}^{N} x_{j}^{\alpha}})\cdot log x_{i}^{\alpha}\le log\sum_{i=1}^{N}(\frac{x_{i}^{\alpha}}{\sum_{j=1}^{N} x_{j}^{\alpha}}\cdot x_{i}^{\alpha}),~~

−

~~</math>~~

−

~~</blockquote>~~

−

~~因此，结合等式A56 和 A57，我们有：~~

−

~~<blockquote>~~

−

~~<math>\frac{\sum_{i=1}^{N} x_{i}^{\alpha}\cdot log x_{i}^{\alpha}}{\sum_{i=1}^{N} x_{i}^{\alpha}}\le log\sum_{i=1}^{N} x_{i}^{\alpha}.</math>~~

−

~~</blockquote>~~

=参考文献=

GongMingkang

140

个编辑