因果几何

不同于离散状态空间或网络，现实中很多动力学演化过程，如鸟群、股票价格、布朗运动、微生物存活率，状态空间都是连续的。

因果几何，意在分析如何测量连续空间中的因果涌现，主要是将用于度量因果的有效信息（EI）从离散状态空间拓展到状态空间连续的随机映射，在随机噪声的因果函数模型，分析EI的计算方法和因果涌现的产生条件。EI计算方法还可以从一维拓展到高维，用矩阵论的方法，得到高维模型下EI的计算。

在推进过程中为了消除人工设置的参数对EI计算的主管影响，会黎曼流形的概念，分析信息几何和因果几何的相关性质，使EI计算以及因果涌现的判断更为有效合理。

连续空间随机映射

随机映射与观测噪声

形如[math]\displaystyle{ y=f(x)+\varepsilon, \varepsilon\sim\mathcal{N}(0,\epsilon^2) }[/math]的随机映射，可以分为两部分，确定映射和随机噪声。其表达了从[math]\displaystyle{ x }[/math]所处空间[math]\displaystyle{ \mathcal{X} }[/math]到从[math]\displaystyle{ y }[/math]所处空间[math]\displaystyle{ \mathcal{Y} }[/math]的随机映射。随机映射可以用转移概率[math]\displaystyle{ p(y|x)=\mathcal{N}(f(x),\epsilon^2) }[/math]的形式表达。

确定部分为一个[math]\displaystyle{ y }[/math]关于[math]\displaystyle{ y }[/math]因果模型可以用[math]\displaystyle{ y=f(x) }[/math]进行表达，其本质上属于一个从[math]\displaystyle{ x }[/math]所处空间[math]\displaystyle{ \mathcal{X} }[/math]到从[math]\displaystyle{ y }[/math]所处空间[math]\displaystyle{ \mathcal{Y} }[/math]的确定映射。它反映了整个系统内在的实际因果机制。

随机噪声[math]\displaystyle{ \varepsilon\sim\mathcal{N}(0,\epsilon^2) }[/math]是由于观测工具的缺陷或估读偏差所产生的误差，这种误差产生的噪声被称为观测噪声，观察噪声导致了系统的不确定性，使[math]\displaystyle{ y }[/math]变为与[math]\displaystyle{ f(x) }[/math]相关，但具有随机性的变量。

人工干预与干预噪声

为了更好的判断两个变量的因果关系，而不受到其他变量影响，我们需要引入能够影自变量[math]\displaystyle{ x }[/math]分布的干预措施[math]\displaystyle{ do(x) }[/math]。最常用且最有效的方法，是让[math]\displaystyle{ x }[/math]服从均匀分布，即[math]\displaystyle{ do(x)\sim U[-L/2,L/2] }[/math]，[math]\displaystyle{ L }[/math]是干预后均匀分布的超参数。

为了增加真实性，干预噪声被添加到输入（干预）变量[math]\displaystyle{ x }[/math]。干预噪声表示为[math]\displaystyle{ \xi\sim\mathcal{N}(0,\delta^2) }[/math]，其中[math]\displaystyle{ \delta }[/math]是[math]\displaystyle{ \xi }[/math]的标准差。

有效信息EI

如果只存在观测噪声，[math]\displaystyle{ L=1 }[/math]，[math]\displaystyle{ \epsilon\ll 1 }[/math]，有效信息EI为：

[math]\displaystyle{ EI \approx \ln(\frac{L}{\sqrt{2\pi e}})+\frac{1}{2L}\int_{-L/2}^{L/2}\ln \left(\frac{f'(x)}{\epsilon}\right)^2dx. }[/math]

如果同时考虑两种噪声，并且如果[math]\displaystyle{ L=1 }[/math]和[math]\displaystyle{ \epsilon\ll 1 }[/math]，EI变为：

[math]\displaystyle{ EI\approx -\frac{1}{2}\int_{-1/2}^{1/2}\ln\left[\left(\frac{\epsilon}{f'(x)}\right)^2+\delta^2\right]dx. }[/math]

这就是连续映射函数EI的公式。当噪声水平低时，与离散映射相比，连续映射可以表现出更高的EI。然而，随着噪声水平的增加，对映射函数进行离散化可以产生具有更高EI的模型。这种现象有助于解释为什么数字电路最终在减轻噪声干扰方面优于模拟电路；数字电路的二值化和粗粒度策略抑制了噪声的传播和扩散。

台灯旋钮的噪声与因果涌现

应用案例：台灯旋钮

我们一般会通过操控台灯旋钮的位置[math]\displaystyle{ x }[/math]来调节台灯的亮度[math]\displaystyle{ y }[/math]，但实际上亮度不是直接由我们调整位置决定，而是取决于其调节旋钮内置的滑动变阻器的电阻大小[math]\displaystyle{ \theta }[/math]。

而通过旋钮调节电阻，电阻影响光强，在现实中都会存在一定的误差，我们用[math]\displaystyle{ y=\theta+\varepsilon }[/math]和[math]\displaystyle{ \theta=x+\delta }[/math]描述两个误差的产生，前者表示观测误差，后者表示敢于误差。

随着误差的增大，可以发现，系统的有效信息会降低。但此时，如果把我们的开关看出二元变量，即指存在开关两种状态，那么就会出现有效信息下降变慢的状况，这时就产生了因果涌现。

信息几何

连续映射EI的表达式可以扩展到更高的维度，假设[math]\displaystyle{ \mathbf{x}\in[-L/2，L/2]^n\subset\mathcal{R}^n }[/math]且[math]\displaystyle{ \mathbf{y}\in\mathcal{R}^m }[/math]，其中[math]\displaystyle{ n }[/math]和[math]\displaystyle{ m }[/math]是正整数。只存在观测噪声的情况下，EI可以推广为以下形式：

[math]\displaystyle{ EI\approx \ln\left(\frac{L^n}{(2\pi e)^{m/2}}\right)+\frac{1}{2}\mathbb{E}_{\mathbf{x}\sim U ([-\frac{L}{2},\frac{L}{2}]^n)}\ln\left|\det\left(\frac{\partial_\mathbf{x} f(\mathbf{x})}{\Sigma^{1/2}}\right)\right|^2, }[/math]

其中[math]\displaystyle{ \Sigma }[/math]是高斯噪声[math]\displaystyle{ \varepsilon }[/math]的协方差矩阵，[math]\displaystyle{ U([-L，L]^n) }[/math]表示超立方体[math]\displaystyle{ [-L，L]^n }[/math]上的均匀分布，[math]\displaystyle{ |\cdot| }[/math]是绝对值运算，[math]\displaystyle{ \det }[/math]是行列式。

Fisher信息量

给定[math]\displaystyle{ \mathbf{x} }[/math]的[math]\displaystyle{ \mathbf{y} }[/math]条件分布是高斯分布，[math]\displaystyle{ p(\mathbf{y}|\mathbf{x})=\mathcal{N}(f(\mathbf{x}),\Sigma) }[/math]。因此，方EI计算中的期望项可以写成：

[math]\displaystyle{ \left|\det\left(\frac{\partial_\mathbf{x} f(\mathbf{x})}{\Sigma^{1/2}}\right)\right|^2=\left|\det\left(\mathbb{E}_{\mathbf{y}|\mathbf{x}}\left[\partial_{\mu}\partial_{\nu}\ln p(\mathbf{y}|\mathbf{x})\right]\right)\right|, }[/math]

并且这恰好等价于分布[math]\displaystyle{ p(\mathbf{y}|\mathbf{x}) }[/math]的负Fisher信息量的行列式：

[math]\displaystyle{ g_{\mu\nu}\equiv -\mathbb{E}_{\mathbf{y}|\mathbf{x}}\left[\partial_{\mu}\partial_{\nu}\ln p(\mathbf{y}|\mathbf{x})\right], }[/math]

黎曼流形

Fisher信息量测量了随机映射[math]\displaystyle{ p(\mathbf{y}|\mathbf{x}) }[/math]对[math]\displaystyle{ \mathbf{y} }[/math]变化的敏感性，其中[math]\displaystyle{ \partial_{\mu}\equiv\frac{\partial}{\partial \mathbf{x}_{\mu}} }[/math]表示关于[math]\displaystyle{ \mathbf{x} }[/math]的第[math]\displaystyle{ \mu }[/math]个分量的偏导数。因此，我们定义了参数空间[math]\displaystyle{ \mbox{$\mathbf{x}\in[-L/2，L/2]^n$} }[/math]上黎曼流形[math]\displaystyle{ \mathcal{M}=\{p(\mathbf{y}|\mathbf{x})\} }[/math]的一个距离度量，它包含了[math]\displaystyle{ p(\mathbf{y}|\mathbf{x}) }[/math]的所有可能分布。这就是本框架中“几何”一词的由来。

最后，我们可以使用Fisher信息度量获得EI的表达式：

[math]\displaystyle{ EI\approx \ln\frac{L^n}{(2\pi e)^{m/2}}-\frac{1}{2}\mathbb{E}_{\mathbf{x}\sim U ([-\frac{L}{2},\frac{L}{2}]^n)}\ln|\det(g_{\mu\nu})|. }[/math]

这个公式可以推广到[math]\displaystyle{ p(\mathbf{y}|\mathbf{x}) }[/math]是非高斯分布的情况。一旦分布函数已知，我们就可以获得其Fisher信息度量，然后可以计算EI。这背后的原因是，任何[math]\displaystyle{ \mathbf{x} }[/math]的整个流形[math]\displaystyle{ p(\mathbf{y}|\mathbf{x}) }[/math]都可以理解为局部高斯分布的级联。

因果几何

为了将信息几何推广到具有干预噪声和观测噪声的情况，需要引入一个新的维度为[math]\displaystyle{ l }[/math]的中间变量[math]\displaystyle{ \theta\subset\mathcal{R}^l }[/math]，使得我们不能通过直接干预[math]\displaystyle{ \mathbf{x} }[/math]来控制[math]\displaystyle{ \mathbf{y} }[/math]。相反，我们可以干预[math]\displaystyle{ \mathbf{x} }[/math]以影响[math]\displaystyle{ \theta }[/math]并间接影响[math]\displaystyle{ \mathbf{y} }[/math]。因此，这三个变量形成了一个马尔可夫链：[math]\displaystyle{ \mathbf{x}\to\theta\to\mathbf{y} }[/math]。

在这种情况下，可以获得两个流形：效应流形[math]\displaystyle{ \mathcal{M}_E=\{p(\mathbf{y}|\theta)\}_{\theta} }[/math]$$，度量为[math]\displaystyle{ g_{\mu\nu}=-\mathbb{E}_{p(\mathbf{y}|\theta)}\partial_{\mu}\partial_{\nu}\ln p(\mathbf{y}|\theta) }[/math]；干预流形[math]\displaystyle{ \mathcal{M}_I=\{\tilde{q}(\mathbf{x}|\theta)\}_{\theta\in \Theta} }[/math]，度量为[math]\displaystyle{ h_{\mu\nu}=-\mathbb{E}_{\tilde{q}(\mathbf{x}|\theta)}\partial_{\mu}\partial_{\nu}\ln \tilde{q}(\mathbf{x}|\theta) }[/math]。其中[math]\displaystyle{ \tilde{q}\equiv \frac{q(\theta|\mathbf{x})}{\int q(\theta|\mathbf{x})d\mathbf{x}} }[/math]，[math]\displaystyle{ \partial_{\mu}=\partial/\partial \theta_{\mu} }[/math]。效应和干预两个流形合在一起称为因果几何。

因果几何的EI计算公式为:

[math]\displaystyle{ EI_g=\ln\frac{V_I}{(2\pi e)^{n/2}}-\frac{1}{2V_I}\int_\Theta\sqrt{|\det(h_{\mu\nu})|} \ln\left|\det\left( I_n+\frac{h_{\mu\nu}}{g_{\mu\nu}}\right)\right|d^l\theta, }[/math]

其中，我们设置[math]\displaystyle{ L=1 }[/math]和[math]\displaystyle{ m=L=n }[/math]来减少自由参数的数量，[math]\displaystyle{ I_n }[/math]是大小为[math]\displaystyle{ n }[/math]的单位矩阵，[math]\displaystyle{ V_I=\int_\Theta\sqrt{|\det(h{\mu\nu})|}d^L\Theta }[/math]。