计算力学

计算力学（Computational Mechanics）是一套用于量化涌现的框架。它以一种计算的视角，研究观察者建立的模型在识别涌现时，模型发生的变化。计算力学以信息论和生物进化的思想为基础，是目前最早的对涌现的定量化研究。在应用当中，该理论提出了对复杂性度量的新指标，并有一套实现涌现识别的机器重构算法。

历史渊源

计算力学源于 20 世纪 70 年代和 80 年代早期非线性物理学领域对流体力学领域湍流问题的研究。为了识别流体湍流中的混沌动力学，科学家们开发了一套重构方法^[1]^[2]，使用测量的时间序列来重构流体动力系统的状态空间，可以在其中观察混沌吸引子并定量测量它们的不稳定程度及其伴随的复杂性。这套重构方法的有效性在 1983 年通过实验得到了验证^[3]，之后就被广泛用于识别和量化确定性混沌系统的行为。但是，这套方法无法简明扼要地表达被重构系统的内部结构。为了使其能够描述系统的内部结构并适用于连续混沌系统，计算力学改进并扩展了这个方法。计算力学的首次提出是在1989 年的一篇论文^[4]中，它基于时间序列重构的状态空间^[1] 和自动机理论^[5]^[6]^[7]定义了一种预测等价关系。利用这种关系分析时间序列数据，识别和量化其中有规律的部分，计算力学就可以构建一个能够预测系统未来行为的模型。

问题背景

自然和社会现象中的涌现

有一些自然和社会现象非常引人入胜，但也很令人困惑，比如行为简单的蚂蚁可以形成复杂的社会，在没有控制中心的情况下自发产生特异化的社会分工^[8]。在没有领导者引导的情况下成群的鸟以步调一致的队形飞行，成群的鱼以连贯的阵列游动，突然一起转向^[9]。经济中商品的最佳定价似乎源于主体遵守当地的商业规则^[10]。这些现象中的全局协调是如何出现的？是否有共同的机制引导着这些不同现象的出现？在复杂系统理论中把这类许多独立子系统相互作用后产生高度结构化的集体行为的现象称作涌现。

目前对涌现的研究理论有基于有效信息的因果涌现理论、基于信息分解的因果涌现理论、基于可逆性的因果涌现理论，基于转移熵的动力学解耦理论^[11]、基于格兰杰因果的G-emergence理论^[12]等等。计算力学是基于统计复杂度对涌现的定量化研究理论，它提出的时间最早，虽然对涌现的的研究方法与上述理论均不同，但有很多研究思路是相似的，它定义的统计复杂度、因果态、斑图重构机器等概念对涌现的研究有很大启发和借鉴意义。

计算力学中的涌现

我们直觉中对涌现的定义就是系统出现了新的特征，但是该定义并没有说明“新特征”是什么，以及它“新”在哪里。所以还需要更精确的语言对涌现的概念进行描述。涌现通常被理解为一个过程，该过程所产生的结构并不能直接由控制系统的定义所约束以及被瞬时力所描述。比如一堆随机运动的粒子，虽然它们受到的瞬时力可以用运动方程描述，但是从宏观尺度上却会表现出诸如压强、体积以及温度等新特征。我们需要引入斑图的概念来明确说明什么是新特征，否则涌现这一概念几乎没有内容，因为几乎任何时间依赖的系统都会表现出涌现特征。

在计算力学中，斑图通常指的是从时间序列中总结出的规律性结构。实际上，检测到的斑图通常是通过观察者选择的统计数据来隐含假定的，可能某些斑图的功能表现与其数学模型一致，但这些模型本身依赖于一系列理论假设。简而言之，斑图通常是被猜测出来的，观察者通过固定的规律库来预测这些结构，然后再进行验证。可以用通信频道做一个类比，观察者就像是一个已经手握密码本的接收者，任何未能通过密码本解码的信号本质上都是噪声，即观察者未能识别的斑图。

在系统内部的协调行为中，有一种斑图变得尤为重要，即这种斑图会在该系统的其他结构中显现其“新颖性”。由于没有外部的参照来定义这种新颖性，我们可以将这个过程称为内在涌现（Intrinsic Emergence）。比如在高效的资本市场中，主体根据从集体行为中涌现出的最优定价控制其个人生产-投资和股票所有权策略^[10]。对于主体的资源配置决策而言，通过市场的集体行为涌现出的价格是准确的信号，完全反映了所有可用信息，这一点至关重要。内在涌现的独特之处在于形成的斑图赋予了系统额外的功能性，支持全局信息处理，如设定最优价格。更具体地说，内在涌现可以直接嵌入系统非线性计算过程之中，能够被系统直接利用，这样就赋予了系统额外的功能性。

总而言之，计算力学区分了三个概念：

对涌现的直觉定义：系统中出现任何可以被称为新颖的特征。
斑图涌现（Pattern Formation）：观察者在系统中识别出的有规律的结构。
内在涌现（Intrinsic Emergence）：系统本身捕捉并利用它自身出现的斑图。

进化的系统模型

我们可以用生物进化的思想来阐述内在涌现的问题，从而解释一个高度有序的系统是怎么从混沌中涌现的，但是它在解释生命形式的多样性方面依然预测能力有限。因此要将系统限制在一个结构和生物特征明确的确定性动力系统（Deterministic Dynamical Systems，简称DS）中，并把它简化为包括一个环境和一组适应性的观察者或“智能体”的模型，这样才能清晰地定义智能体的性质。智能体（Agent）试图构建和维持一个对其环境具有最大预测能力的内部模型。每个智能体的环境是其他智能体的集合，可以视为一个随机动力系统（Stochastic Dynamical Systems，简称SDS）。在任何给定的时刻，智能体感知到的是当前环境状态的投影。随着时间的推移，感官装置产生一系列测量，这些测量引导智能体利用其可用资源（下图中的“基层”）来构建环境模型。基于环境模型捕捉到的规律，智能体通过效应器采取行动，最终改变环境状态。

上图为以智能体为中心的环境视图：宇宙可以被视为一个确定性动力系统，即使规则和初始条件是确定的，随着规模的增长，系统也会变得极为复杂。每个智能体所看到的环境是一个由所有其他智能体组成的随机动力系统。其随机性源于其内在的随机性和有限的计算资源。每个智能体本身也是一个随机动力系统，因为它可能会从其基层和环境刺激中采样或受到无法控制的随机性所困扰。基层代表了支持和限制信息处理、模型构建和决策的可用资源。箭头表示信息流入和流出智能体的方向。

智能体面临的基本问题是基于对环境状态的建模和对未来环境的预测。这需要一个量化的理论来描述智能体如何处理信息和构建模型。

因果态

智能体需要一种有效的描述方式处理接受到的环境信息，使其可以把环境信息压缩成一个有限的状态空间，并存储于内部环境模型中。为了找到这种有效的描述方式，我们需要先定义一个叫做“因果态”的概念。

因果态的定义

智能体对环境的测量精度一般都是有限的，测量结果只能描述环境状态的投影。我们可以将环境从过去到未来的变化用一个离散的稳定随机过程描述，状态的取值空间则为双无限序列可数集合[math]\displaystyle{ \overleftrightarrow{S}=⋯s_{-2} s_{-1} s_0 s_1 s_2… }[/math]，也就是说，一个状态指的是一个时间序列。基于当前的时刻[math]\displaystyle{ t }[/math]，我们可以将[math]\displaystyle{ \overleftrightarrow{S} }[/math]分为单侧前向序列[math]\displaystyle{ s_t^→=s_t s_{t+1} s_{t+2} s_{t+3}… }[/math]和单侧后向序列[math]\displaystyle{ s_t^←=⋯s_{t-3} s_{t-2} s_{t-1} }[/math]两个部分，所有可能的未来序列[math]\displaystyle{ s_t^→ }[/math]形成的集合记作[math]\displaystyle{ \overrightarrow{S} }[/math]，所有可能的历史序列[math]\displaystyle{ \overleftarrow{s_t} }[/math]形成的集合记作[math]\displaystyle{ \overleftarrow{S} }[/math]。某一个时刻的状态指的是截止到当前时刻的历史序列。

通过某种划分（ partition），我们可以找到观测到的状态（可以称之为微观态）与智能体压缩后得到的隐空间上的状态（可以称为宏观态）之间的对应关系。划分为一种映射，[math]\displaystyle{ \eta{:}\overleftarrow{S}\mapsto\mathcal{R} }[/math]，其中[math]\displaystyle{ \mathcal{R} }[/math]是微观状态空间的子集的集合，要满足其元素彼此互斥，而且所有元素的并集等于[math]\displaystyle{ \overset{\leftarrow}{S} }[/math]。通过划分操作得到的每个子集都可以被视为对应着一个宏观态。

上图为某种划分的示意图，将集合[math]\displaystyle{ \overset{\leftarrow}{S} }[/math]划分为某类状态[math]\displaystyle{ \mathcal{R}=\{\mathcal{R}_i:i=1,2,3,4\} }[/math]，值得注意的是，[math]\displaystyle{ \mathcal{R}_i }[/math]不必形成紧致集，也可以是康托集或其他更特殊的结构，上图为了示意清楚才这样画的。

对于集合[math]\displaystyle{ \overset{\leftarrow}{S} }[/math]的划分可以有很多种，若某一种划分能够在预测能力最强的同时消耗的计算资源最少，那么它肯定是最优的划分，我们把这种用最优的划分方法得到的状态称为因果态。因果态就是智能体对测量结果进行处理后，根据其内部模型（尤其是状态结构）识别出的斑图，并且这种斑图不随时间发生变化。形式化定义为：对于任意的时刻[math]\displaystyle{ t }[/math] 和[math]\displaystyle{ t^{'} }[/math]，给定过去状态[math]\displaystyle{ s_t^← }[/math]的条件下，未来状态[math]\displaystyle{ s^→ }[/math]的分布与给定过去状态[math]\displaystyle{ s_{t^{'}}^← }[/math]的条件下，未来状态[math]\displaystyle{ s^→ }[/math]的分布相同。那么[math]\displaystyle{ t }[/math] 和[math]\displaystyle{ t^{'} }[/math]的关系就记作[math]\displaystyle{ t∼t^{'} }[/math]，“[math]\displaystyle{ ∼ }[/math] ” 表示由等效未来状态所引起的等价关系，可以用公式表示为：[math]\displaystyle{ t∼t^{'} \triangleq Pr(s^→ |s_t^← )=Pr(s^→ |s_{t^{'}}^← ) }[/math]，若[math]\displaystyle{ t }[/math] 和[math]\displaystyle{ t^{'} }[/math]对未来状态预测的分布相同，则定义他们具有相同的因果态（casual state）。

如上图所示，左侧的数字代表[math]\displaystyle{ t }[/math]时刻的状态序列，右侧的箭头形状代表对未来状态预测的分布，可以观察到[math]\displaystyle{ t_9 }[/math]和[math]\displaystyle{ t_{13} }[/math]时刻的箭头形状完全相同，说明它们对未来状态预测的分布相同，则处于相同的因果态；同样的道理，在[math]\displaystyle{ t_{11} }[/math]时刻，它的箭头形状与[math]\displaystyle{ t_9 }[/math]和[math]\displaystyle{ t_{13} }[/math]时刻不同，则处于不同的因果态。

斑图重构机器

智能体如何处理测量结果才能识别其中因果态呢？为了解决这个问题，计算力学建立了名为斑图重构机器（ϵ-machine）的模型，它可以重构测量结果中的序列，去除随机噪声后识别其中的因果态。它的形式化定义可以用公式表示为[math]\displaystyle{ M=(\mathcal{S},T) }[/math]，其中因果态的划分映射为[math]\displaystyle{ \epsilon }[/math]，[math]\displaystyle{ T }[/math]为状态到状态映射的集合，满足[math]\displaystyle{ S_{t+1}=TS_t }[/math]，[math]\displaystyle{ S }[/math]为集合[math]\displaystyle{ \mathcal{S} }[/math]中的任意一个因果态，它类似于一个粗粒化后的宏观动力学。[math]\displaystyle{ T_{ij}^{\left ( s \right )} }[/math]为两个因果态[math]\displaystyle{ S_i }[/math]和[math]\displaystyle{ S_j }[/math]之间的因果态转移概率映射，[math]\displaystyle{ T_{ij}^{(s)}\equiv\mathrm{P}(\mathcal{S}'=\mathcal{S}_j,\stackrel{\to}{S}^1=s|\mathcal{S}=\mathcal{S}_i) }[/math]。每个[math]\displaystyle{ \mathcal{S} }[/math]都有[math]\displaystyle{ \epsilon }[/math]映射和[math]\displaystyle{ T }[/math]函数，它们可以组成一个有序对[math]\displaystyle{ \left \{ \epsilon,T \right \} }[/math]，通过学习[math]\displaystyle{ \epsilon }[/math]映射和[math]\displaystyle{ T }[/math]函数可以提高机器识别因果态的准确度。

模型复杂度的量化指标

智能体在构建和优化斑图重构机器的过程中，由于计算资源的限制，不能无限制地增加模型的大小。因此我们需要一个能够量化模型复杂度的指标，以便监控和调整模型的大小，确保模型能匹配已有的计算资源。

柯氏复杂度

在已有的众多复杂度量化指标中，柯氏复杂度是最符合我们要求的一个指标，柯式复杂度[math]\displaystyle{ K(x) }[/math]是指在通用确定性图灵机（Universal Turing Machine，简称UTM）上运行时生成字符串[math]\displaystyle{ x }[/math]的最小程序长度。但它也有比较明显的两个问题，第一个问题是它的不可计算性，由于图灵停机问题的存在，我们无法构造一个通用算法来决定任意程序是否会在给定输入上停机。这意味着我们不能有效地判断哪些程序是最短的，因为没有方法可以保证找到最短程序的存在与否。由于没有通用的方法来验证或找到描述字符串[math]\displaystyle{ x }[/math]的最短程序，我们就无法计算柯氏复杂度。即使我们能够找到一个描述字符串[math]\displaystyle{ x }[/math]的程序，我们也无法保证它是最短的。为了确定程序是否为最短程序，我们需要检查所有可能的程序，而这一过程在计算上是不可行的。第二个问题是它可能无法度量程序的结构和动态特性，因为柯式复杂度[math]\displaystyle{ K(x) }[/math]需要考虑字符串中的所有比特，包括随机状态生成的比特。若[math]\displaystyle{ K(x) }[/math]中字符串[math]\displaystyle{ x }[/math]大部分是由随机状态生成，这样会导致[math]\displaystyle{ x }[/math]的特征和重要结构被掩盖。

统计复杂度

为了解决上文中柯式复杂度的两个明显问题，计算力学提出了统计复杂度[math]\displaystyle{ C_μ(x) }[/math]的概念，用来量化模型的复杂性，它可以反映[math]\displaystyle{ x }[/math]的结构和动态特性，以及能够计算出有效建模所需的计算资源。它的公式为

[math]\displaystyle{ C_{\mu}(x)=\left\|M_{\min }(x \mid \mathrm{BTM})\right\| }[/math]

[math]\displaystyle{ C_{\mu}(x) }[/math]是统计复杂度^[13]，表示时间序列[math]\displaystyle{ x }[/math]的复杂性度量。它反映了在给定精度[math]\displaystyle{ \mu }[/math]下，最小模型的复杂度。[math]\displaystyle{ M_{\min }(x \mid \mathrm{BTM}) }[/math]这是在给定伯努利图灵机（Bernoulli-Turing machine，简称BTM）背景下的最小化模型^[14]，用于捕捉序列[math]\displaystyle{ x }[/math]的模式。这个模型是能够有效预测[math]\displaystyle{ x }[/math]的最简单形式，且在该模型中尽量减少其复杂性。[math]\displaystyle{ \left\|⋅\right\| }[/math]这个符号表示对模型复杂度的量化，它可以是基于模型状态的数量、参数数量或计算资源的度量。

相比之下，统计复杂度[math]\displaystyle{ C_μ(x) }[/math]剔除了通用图灵机在模拟随机比特时所花费的计算努力。统计复杂度的一个特征是，对于完全随机对象，有[math]\displaystyle{ C_μ(x)=0 }[/math]，如抛硬币产生的序列。同时对于简单的周期性过程，如[math]\displaystyle{ x=00000000…0 }[/math]时，也有[math]\displaystyle{ C_μ(x)=0 }[/math]。因此，统计复杂度的值对于（简单的）周期性过程和完全随机过程都很小。

下图展示了柯式复杂度和统计复杂度随着序列从简单周期性到完全随机的过程中的差异。如图(a)所示，柯式复杂度是过程中随机性的单调递增函数，是对信息源不可预测程度的度量，它可以通过香农熵率来衡量其随机性程度。相反，统计复杂度在两个极端点上均为零，并在中间达到最大值（见图(b)）。它基于这样的观点：随机性在统计上是简单的，一个完全随机的过程具有零统计复杂度。周期性在统计上也是简单的，一个完全周期性过程具有较低的统计复杂度。复杂过程在这两个极端之间产生，并且是可预测机制和随机机制的混合，有中等程度随机性的数据具有最大的统计复杂度。

香农熵率

香农熵率（Shannon Entropy Rate）是信息论中的一个概念，通常用来衡量一个信源或随机过程在单位时间内传输的信息量，或者说是该信源的不确定性和复杂性的度量。它是香农熵的扩展，主要用于描述时间序列（如随机过程）的平均信息量。如果待测对象是由信源（例如马尔可夫链）生成的离散符号序列[math]\displaystyle{ s^L }[/math] ，[math]\displaystyle{ L }[/math]为序列的长度，柯式复杂度与香农熵率[math]\displaystyle{ h_μ }[/math]的关系为：

[math]\displaystyle{ \frac{K\left(s^{L}\right)}{L}\underset{L\to\infty}{\operatorname*{\operatorname*{\operatorname*{\rightarrow}}}}h_{\mu} , }[/math]

在这里，香农熵率就可以定义为：

[math]\displaystyle{ h_\mu=\lim_{L\to\infty}\frac{H(\Pr(s^L))}L }[/math] 其中[math]\displaystyle{ \Pr(s^L) }[/math]是[math]\displaystyle{ s^L }[/math]的边际分布，[math]\displaystyle{ H }[/math]是Shannon熵，也就是自信息的平均值，在建模框架中，[math]\displaystyle{ h_μ }[/math]是信息不确定性程度的归一化指标，信息的不确定性越高，香农熵率越大，在这里可以解释为智能体在预测序列[math]\displaystyle{ s^L }[/math]的后续符号时的误差率。

上文中我们介绍了柯式复杂度、统计复杂度和香农熵率这三个与复杂性相关的指标，其实这三者之间存在一个近似关系，可以用公式表示为：

[math]\displaystyle{ K(s^L )≈C_μ (s^L )+h_μ L }[/math]

如果在已确定描述语言（程序）的情况下，柯式复杂度[math]\displaystyle{ K(s^L ) }[/math]可以理解为描述[math]\displaystyle{ s^L }[/math]所用的总信息量。[math]\displaystyle{ h_μ L }[/math]为允许损失的信息量。统计复杂度[math]\displaystyle{ C_μ (s^L ) }[/math]可以理解为允许存在误差率[math]\displaystyle{ h_μ }[/math]的情况下，描述[math]\displaystyle{ s^L }[/math]所用的最少信息量。

因果态的主要性质

因果态的划分映射记作[math]\displaystyle{ \epsilon }[/math]，公式为[math]\displaystyle{ \epsilon{:}\overleftarrow{S}\mapsto2^{\overset{\leftarrow}{S}} }[/math]，其中[math]\displaystyle{ 2^{\overset{\leftarrow}{S}} }[/math]是[math]\displaystyle{ \overleftarrow{S} }[/math]的幂集。根据因果态的定义，则存在如下关系：[math]\displaystyle{ \epsilon(\stackrel{\leftarrow}{s})\equiv\{\stackrel{\leftarrow}{s}^{\prime}|\mathrm{P}(\stackrel{\rightarrow}{S}=\stackrel{\rightarrow}{s}\mid\stackrel{\leftarrow}{S}=\stackrel{\leftarrow}{s})=\mathrm{P}(\stackrel{\rightarrow}{S}=\stackrel{\rightarrow}{s}\mid\stackrel{\leftarrow}{S}=\stackrel{\leftarrow}{s}^{\prime})，\mathrm{for~all~}\overrightarrow{s}\in\overrightarrow{S},\stackrel{\leftarrow}{s}^{\prime}\in\stackrel{\leftarrow}{S}\} }[/math]，其中[math]\displaystyle{ \mathcal{S} }[/math]为因果态的集合，[math]\displaystyle{ \stackrel{\leftarrow}{s} }[/math]为历史序列的随机变量。因果态具有如下性质：

性质1（因果态具有最大预测性）

对于所有划分得到的状态[math]\displaystyle{ \mathcal{R} }[/math]和正整数[math]\displaystyle{ L }[/math]，都有[math]\displaystyle{ H[\stackrel{\rightarrow}{S}^L|\mathcal{R}]\geq H[\stackrel{\rightarrow}{S}^L|\mathcal{S}] }[/math]，[math]\displaystyle{ \stackrel{\rightarrow}{S}^L }[/math]为[math]\displaystyle{ L }[/math]个长度的未来序列集合，[math]\displaystyle{ H[\stackrel{\rightarrow}{S}^L|\mathcal{R}] }[/math]和[math]\displaystyle{ H[\stackrel{\rightarrow}{S}^L|\mathcal{S}] }[/math]是[math]\displaystyle{ \stackrel{\rightarrow}{S}^L }[/math]的条件熵。可以理解为因果态集合[math]\displaystyle{ \mathcal{S} }[/math]在划分得到的状态集合[math]\displaystyle{ \mathcal{R} }[/math]的所有类型中，它的预测能力最强，证明过程如下：

[math]\displaystyle{ \epsilon(\stackrel{\leftarrow}{s})\equiv\{\stackrel{\leftarrow}{s}^{\prime}|\mathrm{P}(\stackrel{\rightarrow}{S}=\stackrel{\rightarrow}{s}\mid\stackrel{\leftarrow}{S}=\stackrel{\leftarrow}{s})=\mathrm{P}(\stackrel{\rightarrow}{S}=\stackrel{\rightarrow}{s}\mid\stackrel{\leftarrow}{S}=\stackrel{\leftarrow}{s}^{\prime}) }[/math]

[math]\displaystyle{ \mathrm{P}(\stackrel{\rightarrow}{S}=\stackrel{\rightarrow}{s} |\mathcal{S}=\epsilon(\stackrel{\leftarrow}{s}))=\mathrm{P}(\stackrel{\rightarrow}{S}=\stackrel{\rightarrow}{s}\mid\stackrel{\leftarrow}{S}=\stackrel{\leftarrow}{s}) }[/math]

[math]\displaystyle{ H[\stackrel{\rightarrow}{S}^L|\mathcal{S}]~=~H[\stackrel{\rightarrow}{S}^L|~\stackrel{\leftarrow}{S}] }[/math]

[math]\displaystyle{ H[\stackrel{\to}{S}^L|\mathcal{R}] \geq H[\stackrel{\to}{S}^L|\stackrel{\leftarrow}{S}] }[/math]

[math]\displaystyle{ H[\stackrel{\rightarrow}{S}^L|\mathcal{R}]\geq H[\stackrel{\rightarrow}{S}^L|\mathcal{S}] }[/math]

性质2（因果态具有最小统计复杂度）

设[math]\displaystyle{ \hat{\mathcal{R}} }[/math]为满足性质1中不等式等号成立时划分得到的状态，则对于所有的[math]\displaystyle{ \hat{\mathcal{R}} }[/math]，都有[math]\displaystyle{ C_\mu(\hat{\mathcal{R}})\geq C_\mu(\mathcal{S}) }[/math]。可以理解为在相同预测能力的前提下，因果态集合[math]\displaystyle{ \mathcal{S} }[/math]在划分得到的状态集合[math]\displaystyle{ \mathcal{R} }[/math]的所有类型中，它的统计复杂度最小，证明过程如下：

对于任意的[math]\displaystyle{ \mathcal{R} }[/math]，若[math]\displaystyle{ H[\stackrel{\rightarrow}{S}^L|\mathcal{R}]= H[\stackrel{\rightarrow}{S}^L|\mathcal{S}] }[/math]，则存在函数[math]\displaystyle{ g }[/math]使得[math]\displaystyle{ \mathcal{S}=g(\mathcal{R}) }[/math]总是成立。

根据[math]\displaystyle{ \mathcal{R} }[/math]的定义可知，[math]\displaystyle{ H[\vec{S}^L|\mathcal{R}]\lt LH[S] }[/math]，则[math]\displaystyle{ H[f(X)]\leqslant H[X] }[/math]。

所以[math]\displaystyle{ H[S]=H[g(\hat{\mathcal{R}})]\leqslant H[\hat{\mathcal{R}}] }[/math]

根据统计复杂度的定义可知，[math]\displaystyle{ C_\mu(\mathcal{R})\equiv H[\mathcal{R}] }[/math]，则[math]\displaystyle{ C_\mu(\hat{\mathcal{R}})=H[\hat{\mathcal{R}}] }[/math]。

所以[math]\displaystyle{ C_\mu(\hat{\mathcal{R}})\geq C_\mu(\mathcal{S}) }[/math]

结合本条性质，公式[math]\displaystyle{ K(s^L )≈C_μ (s^L )+h_μ L }[/math]中求[math]\displaystyle{ C_μ (s^L ) }[/math]就是求[math]\displaystyle{ s^L }[/math]对应的因果态的统计复杂度，也就是说想要计算[math]\displaystyle{ C_μ (s^L ) }[/math]需要先找到[math]\displaystyle{ s^L }[/math]对应的因果态。上式也可以理解为：序列[math]\displaystyle{ s^L }[/math]的总信息量≈被归纳的因果态信息量+放弃归纳的随机信息量

性质3（因果态具有最小随机性）

设[math]\displaystyle{ \hat{\mathcal{R}} }[/math]和[math]\displaystyle{ \hat{\mathcal{R}}^{\prime} }[/math]为满足性质1中不等式等号成立的状态，则对于所有的[math]\displaystyle{ \hat{\mathcal{R}} }[/math]和[math]\displaystyle{ \hat{\mathcal{R}}^{\prime} }[/math]，都有[math]\displaystyle{ H[\hat{\mathcal{R}}^{\prime}|\hat{\mathcal{R}}]\geq H[\mathcal{S}^{\prime}|\mathcal{S}] }[/math]，其中[math]\displaystyle{ \hat{\mathcal{R}}^{\prime} }[/math]和[math]\displaystyle{ \mathcal{S}^{\prime} }[/math]分别是该过程的下一时刻状态和下一时刻因果态。可以理解为在相同预测能力的前提下，因果态集合[math]\displaystyle{ \mathcal{S} }[/math]在划分得到的状态集合[math]\displaystyle{ \mathcal{R} }[/math]的所有类型中，它的随机性最小。

用互信息的角度去理解的话，上式等价于[math]\displaystyle{ I(\mathcal{S}^{\prime};\mathcal{S})\geq I(\hat{\mathcal{R}}^{\prime};\hat{\mathcal{R}}) }[/math]，可以理解为任意状态对它自己下一时刻的互信息中，其中因果态的互信息最大。

要计算模型的统计复杂度[math]\displaystyle{ C_μ(x) }[/math]，我们需要找到一种方法来最大限度地压缩描述环境信息，因果态的性质正好能满足上述要求，所以只要将环境信息最大化的转化为因果态的形式，就能计算模型的最小统计复杂度。若想更深入的理解因果态的性质可以阅读Cosma Rohilla Shalizi 和James Crutchfield合写的一篇论文^[15]，里面有因果态更多的性质和对应的形式化证明过程。

模型的创新与重构

模型创新

由于智能体的计算资源有限，若测量结果中的数据量超过模型的处理极限时，就需要对原有模型进行创新以保障在计算资源不变的情况下智能体对外界的有效预测，创新方法主要是通过寻找原有模型状态组之间的相似性，在原有模型识别到的因果态中抽象出更高层级的因果态组。下表中列举了一种模型创新的途径：

上表为因果时间序列建模层级，展示了可能的前四个层级，但这个过程是开放式的，可以不止四个层级，每个层级根据其模型类别定义。模型本身由状态（圆圈或方块）和转移（标记箭头）组成，每个模型都有一个独特的起始状态，由一个内嵌的圆圈表示。数据流本身是最低层级，通过将序列测量分组为重复子序列，从数据流中构建出深度为的树。下一个层级的模型，即具有状态和转移的有限自动机（Finite Automaton，简称FA），通过将树节点分组从树中重构。所示的最后一个层级，字符串生成机器（Pattern Matching，简称PM），通过将FA状态分组并推断出操控寄存器中字符串的生成规则来构建。

考虑一个由[math]\displaystyle{ m }[/math]个测量结果构成的数据流[math]\displaystyle{ s }[/math]，如果它是周期性的，那么第0级（Level 0），即测量结果本身，它的表示方法依赖于[math]\displaystyle{ m }[/math]。在极限[math]\displaystyle{ m\to\infty }[/math]情况下，第0级会产生一个无穷大的表示。当然，第0级是最准确的数据模型，尽管它几乎没有任何帮助，几乎不值得被称为“模型”。相比之下，一个深度为[math]\displaystyle{ D }[/math]的树将给出一个有限表示，即使数据流的长度是无限的，只要数据流具有周期小于等于[math]\displaystyle{ D }[/math]。这个树有长度为[math]\displaystyle{ D }[/math]的路径，由数据流的周期给出。这些路径中的每一个都对应于[math]\displaystyle{ s }[/math]中重复模式的一个不同阶段。如果[math]\displaystyle{ s }[/math]是非周期性的，那么树模型类将不再是有限的，并且与[math]\displaystyle{ m }[/math]无关。事实上，如果数据流具有正熵（[math]\displaystyle{ h_μ＞0 }[/math]），那么树的大小将呈指数增长，[math]\displaystyle{ \approx\left\|\mathcal{A}\right\|^{Dh_{\mu}} }[/math] ，因为[math]\displaystyle{ D }[/math]的增加解释了[math]\displaystyle{ s }[/math] 中长度[math]\displaystyle{ D }[/math]不断增加的子序列。粗略地说，如果数据流具有随时间衰减得足够快的相关性，则下一级（随机）有限自动机将给出有限表示。状态数[math]\displaystyle{ \left\|\mathrm{V}\right\| }[/math]表示数据流中的内存量，因此表示[math]\displaystyle{ s }[/math]中测量值之间存在相关性的典型时间。但也有可能第 2 级不提供有限表示。那么就需要另一个级别（第 3 级）。

模型重构算法

上面介绍了斑图重构机器，是智能体识别因果态的一种方式。若结合模型创新的概念，就可以给出斑图重构机器的完整定义：斑图重构机器（ϵ-machine）是能够用最少的计算资源对测量结果进行有限描述同时复杂度最小的模型。模型的算法步骤如下：

1. 在最低水平上，设定0级模型为描述数据本身，即[math]\displaystyle{ M_0=s }[/math]，将初始层级[math]\displaystyle{ l }[/math]设置为比0级高一级，即[math]\displaystyle{ l=1 }[/math]；

2. 从更低模型重构模型[math]\displaystyle{ M_l=M_{l-1}/∼ }[/math]，其中[math]\displaystyle{ ∼ }[/math]表示[math]\displaystyle{ l }[/math]级上的因果等价类；操作的含义是，在[math]\displaystyle{ l-1 }[/math]级上被区别对待的状态在[math]\displaystyle{ l }[/math]级上可以被视为同一个因果态。此时[math]\displaystyle{ \mathcal{S} }[/math]和[math]\displaystyle{ T }[/math]都更新了；

3. 收集更多的数据，增大序列长度[math]\displaystyle{ L }[/math]，得到更加精确的一系列模型[math]\displaystyle{ M_l }[/math]；

4. 如果随着[math]\displaystyle{ L }[/math]增大，模型的复杂度发散，即[math]\displaystyle{ C_μ(M_l) = \lVert M_l \rVert \underset{L \to 0}{\to} \infty }[/math]，那么回到第二步，得到更高级模型[math]\displaystyle{ M_{l+1} }[/math]；

5. 如果模型复杂度收敛，意味着重建好了一个合适的斑图重构机器，程序退出。

混沌动力学实例

逻辑斯谛映射

接下来将采用具体的方法来演示如何将计算力学的理论应用于实际案例^[16]，要演示的是混沌动力学中的逻辑斯谛映射(logistic map)，特别是其周期倍增的混沌路径。用于重建模型的数据流来自逻辑斯谛映射的轨迹，轨迹是通过迭代映射[math]\displaystyle{ x_{n+1}=f(x_n) }[/math]生成的，迭代函数为[math]\displaystyle{ f(x) = rx(1-x) }[/math]，其中非线性参数[math]\displaystyle{ \begin{matrix}r&\in&[0,4]\end{matrix} }[/math]，初始条件[math]\displaystyle{ x_0\in[0,1] }[/math]，迭代函数的最大值出现在[math]\displaystyle{ x_c = \frac12 }[/math]。

上图为迭代函数[math]\displaystyle{ f(x) = rx(1-x) }[/math]中[math]\displaystyle{ r }[/math]与[math]\displaystyle{ x }[/math]的关系图，当[math]\displaystyle{ r＜3.5699... }[/math]时函数存在倍周期现象，当[math]\displaystyle{ r＞3.5699... }[/math]时会出现混沌现象。由于观察者观测的精细程度有限，若要识别混沌中的有序结构，就需要对[math]\displaystyle{ x }[/math]进行粗粒化操作，方法是通过二元分割观察轨迹[math]\displaystyle{ \mathbf{x}=x_0x_1x_2x_3\ldots }[/math] ，将其转换为离散序列[math]\displaystyle{ \mathcal{P}=\{x_n\in[0,x_c)\Rightarrow s=0,x_n\in[x_c,1]\Rightarrow s=1\} }[/math]，这种划分是“生成”的，这意味着足够长的二进制序列来自任意小的初始条件间隔。因此，可以使用粗粒化的观测[math]\displaystyle{ \mathcal{P} }[/math]来研究逻辑斯谛映射中的信息处理。

统计复杂度-熵率图

上图（a）为逻辑斯谛映射中统计复杂度[math]\displaystyle{ C_μ }[/math]与香农熵率[math]\displaystyle{ H(L)/L }[/math]的关系，三角形表示[math]\displaystyle{ (C_μ ,H(L)/L) }[/math]的大概位置，对应非线性参数[math]\displaystyle{ r }[/math]的 193 个取值，其中子序列长度[math]\displaystyle{ L=16 }[/math]，覆盖部分实验数据的粗实线是[math]\displaystyle{ C_μ =0 }[/math]时对[math]\displaystyle{ H(L)/L }[/math]得出的分析曲线。本图表现两个重要特征。第一个特征是熵的极值导致零复杂度，也就是说在[math]\displaystyle{ H(L)/L=0 }[/math]处最简单的周期过程和在[math]\displaystyle{ H(L)/L=1 }[/math]处最随机的过程在统计上都是简单的，它们都具有零复杂度，因为它们是由具有单一状态的斑图重构机器描述的。第二个特征是在两个极端情况之间，过程明显更为复杂，在临界熵值[math]\displaystyle{ H_c }[/math]附近出现明显峰值（此处[math]\displaystyle{ r=3.5699... }[/math]），[math]\displaystyle{ H(L)/L }[/math]小于[math]\displaystyle{ H_c }[/math]时数据集在呈周期性（包括在混沌区域也呈周期性的参数）的参数下产生，大于[math]\displaystyle{ H_c }[/math]时数据集在混沌的参数下产生。本图可以对照统计复杂度小节中的图（b）理解。

上图（b）为逻辑斯谛映射中在[math]\displaystyle{ r=3.5699... }[/math]处，[math]\displaystyle{ L }[/math]与隐状态数量[math]\displaystyle{ \left|\mathbf{V}\right| }[/math]的关系，序列的长度[math]\displaystyle{ L=64 }[/math]时，[math]\displaystyle{ \left|\mathbf{V}\right|=196 }[/math]，从图中可以看出，随着[math]\displaystyle{ L }[/math]的增长，[math]\displaystyle{ \left|\mathbf{V}\right| }[/math]的值是发散的，若要用有限的[math]\displaystyle{ L }[/math]来描述[math]\displaystyle{ \left|\mathbf{V}\right| }[/math]，就需要将模型进行创新升级为描述能力更强的模型。

模型升级

以上三张图展示了模型重构进化的一种路径。上图（a）为逻辑斯谛映射在[math]\displaystyle{ r=3.5699... }[/math]处，序列长度[math]\displaystyle{ L=16 }[/math]时用斑图重构机器重构后的47个状态路径图。它捕捉到的规律并不明显，我们将它进行一个简单的转换，用相应的序列替换机器中未分支的路径后就是图（b），图（b）中的分支状态相当有规律，更进一步将图（b）升级为用字符生成器来描述机器增长的规律性，如图（c）所示，有限自动机有两种状态（原有类型用圆圈表示，新类型用方块表示）和两个寄存器 A 和 B，A 和 B用于保存二进制字符串，初始状态 A 中保存的是0，B中保存的是1，B'表示对保存在B中的字符串的最后一位取反。观察一下图（b）就会发现字符串操作可以通过将 A 的内容副本附加到 B 上，并用 B 的内容的两个副本替换 A 的内容来描述。这些字符串在方块处迭代，迭代式表示为 A→BB 和 B→BA。显然（c）的方式比（a）的方式更加节省计算资源，它的描述能力也更强。

计算力学与因果涌现理论的相似性

计算力学的许多概念在因果涌现理论中可以找到对应的近似等价概念，通过进行两者之间的对应和比较，可以拓展对涌现的理解和研究。

计算力学中的时间序列可以看作是因果涌现中的微观状态，划分得到的状态[math]\displaystyle{ \mathcal{R}_i \in \mathcal{R} }[/math]对应宏观状态，因果转移映射[math]\displaystyle{ T }[/math] 对应于有效的宏观动力学。
计算力学中的状态映射函数[math]\displaystyle{ \eta }[/math]可以看作是因果涌现中的粗粒化策略，其中因果态的映射函数[math]\displaystyle{ \epsilon }[/math]对应能够最大化有效信息的粗粒化策略。
计算力学中的斑图重构机器（ϵ-machine）和因果涌现中的神经信息压缩机（NIS+）也有相似的地方，比如斑图重构机器可以识别因果态和预测未来状态，神经信息压缩机可以识别和生成最大化有效信息的宏观态，都能够最大化的保留有用信息。

参考文献

↑ ^1.0 ^1.1 N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw. Geometry from a time series. Phys. Rev. Let., 45:712, 1980.
↑ F. Takens. Detecting strange attractors in fluid turbulence. In D. A. Rand and L. S. Young, editors, Symposium on Dynamical Systems and Turbulence, volume 898, page 366, Berlin, 1981. Springer-Verlag
↑ A. Brandstater, J. Swift, Harry L. Swinney, A. Wolf, J. D. Farmer, E. Jen, and J. P. Crutchfield. Lowdimensional chaos in a hydrodynamic system. Phys. Rev. Lett., 51:1442, 1983
↑ ] J. P. Crutchfield and K. Young. Inferring statistical complexity. Phys. Rev. Let., 63:105–108, 1989.
↑ M. Minsky. Computation: Finite and Infinite Machines. Prentice-Hall, Englewood Cliffs, New Jersey, 1967
↑ N. Chomsky. Three models for the description of language. IRE Trans. Info. Th., 2:113–124, 1956
↑ J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. AddisonWesley, Reading, 1979
↑ B. Holldobler and E. O. Wilson. The Ants. Belknap Press of Harvard University Press, Cambridge, Massachusetts, 1990.
↑ C. W. Reynolds. Flocks, herds, and schools: A distributed behavioral model. Computer Graphics, 21:25 – 34, 1987
↑ ^10.0 ^10.1 E. F. Fama. Efficient capital markets II. J. Finance, 46:1575 – 1617, 1991
↑ Barnett L, Seth AK. Dynamical independence: discovering emergent macroscopic processes in complex dynamical systems. Physical Review E. 2023 Jul;108(1):014304.
↑ A. K. Seth, Measuring emergence via nonlinear granger causality., in: alife, Vol. 2008, 2008, pp. 545–552.
↑ J. P. Crutchfield and K. Young. Inferring statistical complexity.Phys. Rev. Let., 63:105–108,1989.
↑ C. H. Bennett. Dissipation, information, computational complexity, and the definition of organization. In D. Pines, editor, Emerging Syntheses in the Sciences. Addison-Wesley, Redwood City, 1988.
↑ Shalizi, C. R.. & Crutchfield, J. P. (2001). Computational Mechanics: Pattern and Prediction, Structure and Simplicity,Journal of Statistical Physics,104(3/4).817-879.
↑ James P. Crutchfield. The Calculi of Emergence: Computation, Dynamics, and Induction. SFI 94-03-016. 1994

编者推荐

[:5-1] 1.0 ^1.1 N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw. Geometry from a time series. Phys. Rev. Let., 45:712, 1980.

[2] F. Takens. Detecting strange attractors in fluid turbulence. In D. A. Rand and L. S. Young, editors, Symposium on Dynamical Systems and Turbulence, volume 898, page 366, Berlin, 1981. Springer-Verlag

[3] A. Brandstater, J. Swift, Harry L. Swinney, A. Wolf, J. D. Farmer, E. Jen, and J. P. Crutchfield. Lowdimensional chaos in a hydrodynamic system. Phys. Rev. Lett., 51:1442, 1983

[4] ] J. P. Crutchfield and K. Young. Inferring statistical complexity. Phys. Rev. Let., 63:105–108, 1989.

[5] M. Minsky. Computation: Finite and Infinite Machines. Prentice-Hall, Englewood Cliffs, New Jersey, 1967

[6] N. Chomsky. Three models for the description of language. IRE Trans. Info. Th., 2:113–124, 1956

[7] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. AddisonWesley, Reading, 1979

[8] B. Holldobler and E. O. Wilson. The Ants. Belknap Press of Harvard University Press, Cambridge, Massachusetts, 1990.

[9] C. W. Reynolds. Flocks, herds, and schools: A distributed behavioral model. Computer Graphics, 21:25 – 34, 1987

[:0-10] 10.0 ^10.1 E. F. Fama. Efficient capital markets II. J. Finance, 46:1575 – 1617, 1991

[11] Barnett L, Seth AK. Dynamical independence: discovering emergent macroscopic processes in complex dynamical systems. Physical Review E. 2023 Jul;108(1):014304.

[12] A. K. Seth, Measuring emergence via nonlinear granger causality., in: alife, Vol. 2008, 2008, pp. 545–552.

[13] J. P. Crutchfield and K. Young. Inferring statistical complexity.Phys. Rev. Let., 63:105–108,1989.

[14] C. H. Bennett. Dissipation, information, computational complexity, and the definition of organization. In D. Pines, editor, Emerging Syntheses in the Sciences. Addison-Wesley, Redwood City, 1988.

[:4-15] Shalizi, C. R.. & Crutchfield, J. P. (2001). Computational Mechanics: Pattern and Prediction, Structure and Simplicity,Journal of Statistical Physics,104(3/4).817-879.

[:1-16] James P. Crutchfield. The Calculi of Emergence: Computation, Dynamics, and Induction. SFI 94-03-016. 1994

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]