“自由能原理”的版本间的差异

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
跳到导航 跳到搜索
第128行: 第128行:
 
  <math> \underset{\text{free-energy}} {\underbrace{ F(s,\mu)}} = \underset{\text{complexity}} {\underbrace{ D_\mathrm{KL}[q(\psi\mid\mu)\parallel p(\psi\mid m)]}} - \underset{\mathrm{accuracy}} {\underbrace{E_q[\log p(s\mid\psi,m)]}}</math>
 
  <math> \underset{\text{free-energy}} {\underbrace{ F(s,\mu)}} = \underset{\text{complexity}} {\underbrace{ D_\mathrm{KL}[q(\psi\mid\mu)\parallel p(\psi\mid m)]}} - \underset{\mathrm{accuracy}} {\underbrace{E_q[\log p(s\mid\psi,m)]}}</math>
  
 
This induces a dual minimisation with respect to action and internal states that correspond to action and perception respectively.
 
  
 
这导致了一个双重最小化的行动和内部状态,分别对应于行动和感知。
 
这导致了一个双重最小化的行动和内部状态,分别对应于行动和感知。
  
Models with minimum free energy provide an accurate explanation of data, under complexity costs (c.f., Occam's razor and more formal treatments of computational costs). Here, complexity is the divergence between the variational density and prior beliefs about hidden states (i.e., the effective degrees of freedom used to explain the data).
 
  
 
具有最小自由能的模型在复杂度成本(c.f.,Occam's razor和更正式的计算成本处理)下提供了数据的精确解释。这里,复杂性是变分密度和关于隐藏状态的先验信念(即用于解释数据的有效自由度)之间的差异。
 
具有最小自由能的模型在复杂度成本(c.f.,Occam's razor和更正式的计算成本处理)下提供了数据的精确解释。这里,复杂性是变分密度和关于隐藏状态的先验信念(即用于解释数据的有效自由度)之间的差异。
第143行: 第140行:
 
=== Free energy minimisation and self-organisation 自由能最小化和自组织===
 
=== Free energy minimisation and self-organisation 自由能最小化和自组织===
  
 
 
Variational free energy is an information theoretic functional and is distinct from thermodynamic (Helmholtz) free energy. However, the complexity term of variational free energy shares the same fixed point as Helmholtz free energy (under the assumption the system is thermodynamically closed but not isolated). This is because if sensory perturbations are suspended (for a suitably long period of time), complexity is minimised (because accuracy can be neglected). At this point, the system is at equilibrium and internal states minimise Helmholtz free energy, by the principle of minimum energy.
 
  
 
变分自由能是一种信息论泛函,不同于热力学(亥姆霍兹)自由能。然而,变分自由能的复杂性项与亥姆霍兹自由能具有相同的固定点(假设系统是热力学闭合的,而不是孤立的)。这是因为如果感觉干扰暂停(适当长的时间) ,复杂性是最小的(因为准确性可以忽略)。在这一点上,系统处于平衡状态,内部状态通过最小能量原理使亥姆霍兹自由能最小。
 
变分自由能是一种信息论泛函,不同于热力学(亥姆霍兹)自由能。然而,变分自由能的复杂性项与亥姆霍兹自由能具有相同的固定点(假设系统是热力学闭合的,而不是孤立的)。这是因为如果感觉干扰暂停(适当长的时间) ,复杂性是最小的(因为准确性可以忽略)。在这一点上,系统处于平衡状态,内部状态通过最小能量原理使亥姆霍兹自由能最小。
  
Free energy minimisation has been proposed as a hallmark of self-organising systems when cast as [[random dynamical system]]s.<ref>Crauel, H., & Flandoli, F. (1994). [https://www.researchgate.net/profile/Hans_Crauel/publication/227072665_Attractor_for_random_dynamical_systems/links/57c2033708aed246b0fe05b5/Attractor-for-random-dynamical-systems.pdf Attractors for random dynamical systems]. Probab Theory Relat Fields, 100, 365–393.</ref> This formulation rests on a [[Markov blanket]] (comprising action and sensory states) that separates internal and external states. If internal states and action minimise free energy, then they place an upper bound on the entropy of sensory states
 
  
 
自由能最小化被认为是[[自组织系统]]的一个标志。<ref>Crauel, H., & Flandoli, F. (1994). [https://www.researchgate.net/profile/Hans_Crauel/publication/227072665_Attractor_for_random_dynamical_systems/links/57c2033708aed246b0fe05b5/Attractor-for-random-dynamical-systems.pdf Attractors for random dynamical systems]. Probab Theory Relat Fields, 100, 365–393.</ref> 这个公式建立在一个[[马尔可夫毯]](包括行动和感觉状态)分离内部和外部状态。如果内部状态和行为使自由能最小化,那么它们就给感官状态的熵设置了一个上限。
 
自由能最小化被认为是[[自组织系统]]的一个标志。<ref>Crauel, H., & Flandoli, F. (1994). [https://www.researchgate.net/profile/Hans_Crauel/publication/227072665_Attractor_for_random_dynamical_systems/links/57c2033708aed246b0fe05b5/Attractor-for-random-dynamical-systems.pdf Attractors for random dynamical systems]. Probab Theory Relat Fields, 100, 365–393.</ref> 这个公式建立在一个[[马尔可夫毯]](包括行动和感觉状态)分离内部和外部状态。如果内部状态和行为使自由能最小化,那么它们就给感官状态的熵设置了一个上限。
第157行: 第150行:
 
\lim_{T\to\infty} \frac{1}{T} \int_0^T \underset{\text{surprise}}{\underbrace{-\log p(s(t)\mid m)}} \, dt = H[p(s\mid m)] </math>
 
\lim_{T\to\infty} \frac{1}{T} \int_0^T \underset{\text{surprise}}{\underbrace{-\log p(s(t)\mid m)}} \, dt = H[p(s\mid m)] </math>
  
Free energy minimisation is equivalent to maximising the mutual information between sensory states and internal states that parameterise the variational density (for a fixed entropy variational density). and related treatments using information theory to describe optimal behaviour.
 
  
 
自由能最小化相当于最大化感观状态和内部状态之间的互信息,使变分密度参数化(对于固定熵变分密度)。利用信息论描述最优行为的相关处理。
 
自由能最小化相当于最大化感观状态和内部状态之间的互信息,使变分密度参数化(对于固定熵变分密度)。利用信息论描述最优行为的相关处理。
  
This is because – under [[Ergodic theory|ergodic]] assumptions – the long-term average of surprise is entropy. This bound resists a natural tendency to disorder – of the sort associated with the [[second law of thermodynamics]] and the [[fluctuation theorem]].
 
  
 
这是因为在[[遍历理论|遍历]]假设下,惊喜的长期平均值是熵。这个界限抵抗了一种自然的无序倾向,这种无序倾向与[[热力学第二定律]]和[[涨落定理]]有关。
 
这是因为在[[遍历理论|遍历]]假设下,惊喜的长期平均值是熵。这个界限抵抗了一种自然的无序倾向,这种无序倾向与[[热力学第二定律]]和[[涨落定理]]有关。
第167行: 第158行:
 
=== Free energy minimisation and Bayesian inference 自由能最小化与贝叶斯推理===
 
=== Free energy minimisation and Bayesian inference 自由能最小化与贝叶斯推理===
  
Free energy minimisation provides a useful way to formulate normative (Bayes optimal) models of neuronal inference and learning under uncertainty and therefore subscribes to the Bayesian brain hypothesis. The neuronal processes described by free energy minimisation depend on the nature of hidden states: <math> \Psi = X \times \Theta \times \Pi </math> that can comprise time-dependent variables, time-invariant parameters and the precision (inverse variance or temperature) of random fluctuations. Minimising variables, parameters, and precision correspond to inference, learning, and the encoding of uncertainty, respectively.
 
  
 
自由能最小化为在不确定性条件下建立神经元推理和学习的规范(Bayes最优)模型提供了一种有用的方法,因此符合贝叶斯Bayesian脑假设。由自由能最小化描述的神经元过程取决于隐藏状态的性质:<math> \Psi = X \times \Theta \times \Pi </math>,它可以包括时间相关变量、时不变参数和随机波动的精度(逆方差或温度)。最小化变量、参数和精度分别对应于推理、学习和不确定性编码。
 
自由能最小化为在不确定性条件下建立神经元推理和学习的规范(Bayes最优)模型提供了一种有用的方法,因此符合贝叶斯Bayesian脑假设。由自由能最小化描述的神经元过程取决于隐藏状态的性质:<math> \Psi = X \times \Theta \times \Pi </math>,它可以包括时间相关变量、时不变参数和随机波动的精度(逆方差或温度)。最小化变量、参数和精度分别对应于推理、学习和不确定性编码。
  
 
All Bayesian inference can be cast in terms of free energy minimisation; e.g.,.<ref>Roweis, S., & [[Zoubin Ghahramani|Ghahramani, Z.]] (1999). [http://authors.library.caltech.edu/13697/1/ROWnc99.pdf A unifying review of linear Gaussian models]. Neural Computat. , 11 (2), 305–45. {{doi|10.1162/089976699300016674}}</ref>{{Failed verification|date=April 2020}} When free energy is minimised with respect to internal states, the [[Kullback–Leibler divergence]] between the variational and posterior density over hidden states is minimised. This corresponds to approximate [[Bayesian inference]] – when the form of the variational density is fixed – and exact [[Bayesian inference]] otherwise. Free energy minimisation therefore provides a generic description of Bayesian inference and filtering (e.g., [[Kalman filter]]ing). It is also used in Bayesian [[model selection]], where free energy can be usefully decomposed into complexity and accuracy:
 
  
 
所有的贝叶斯推断都可以用自由能最小化来表示,例如,<ref>Roweis, S., & [[Zoubin Ghahramani|Ghahramani, Z.]] (1999). [http://authors.library.caltech.edu/13697/1/ROWnc99.pdf A unifying review of linear Gaussian models]. Neural Computat. , 11 (2), 305–45. {{doi|10.1162/089976699300016674}}</ref>{{验证失败|日期=2020年4月}}当自由能相对于内部态最小化时,隐态上变分密度和后验密度之间的[[Kullback–Leibler散度]]最小化。当变分密度的形式固定时,这对应于近似的[[贝叶斯推理]],否则对应于精确的[[贝叶斯推理]]。因此,自由能最小化提供了贝叶斯推理和滤波的一般描述(例如,[[Kalman filter]]ing)。它也用于贝叶斯[[模型选择]],其中自由能可以有效地分解为复杂性和准确性:
 
所有的贝叶斯推断都可以用自由能最小化来表示,例如,<ref>Roweis, S., & [[Zoubin Ghahramani|Ghahramani, Z.]] (1999). [http://authors.library.caltech.edu/13697/1/ROWnc99.pdf A unifying review of linear Gaussian models]. Neural Computat. , 11 (2), 305–45. {{doi|10.1162/089976699300016674}}</ref>{{验证失败|日期=2020年4月}}当自由能相对于内部态最小化时,隐态上变分密度和后验密度之间的[[Kullback–Leibler散度]]最小化。当变分密度的形式固定时,这对应于近似的[[贝叶斯推理]],否则对应于精确的[[贝叶斯推理]]。因此,自由能最小化提供了贝叶斯推理和滤波的一般描述(例如,[[Kalman filter]]ing)。它也用于贝叶斯[[模型选择]],其中自由能可以有效地分解为复杂性和准确性:
第178行: 第166行:
 
: <math> \underset{\text{free-energy}} {\underbrace{ F(s,\mu)}} = \underset{\text{complexity}} {\underbrace{ D_\mathrm{KL}[q(\psi\mid\mu)\parallel p(\psi\mid m)]}} - \underset{\mathrm{accuracy}} {\underbrace{E_q[\log p(s\mid\psi,m)]}}</math>
 
: <math> \underset{\text{free-energy}} {\underbrace{ F(s,\mu)}} = \underset{\text{complexity}} {\underbrace{ D_\mathrm{KL}[q(\psi\mid\mu)\parallel p(\psi\mid m)]}} - \underset{\mathrm{accuracy}} {\underbrace{E_q[\log p(s\mid\psi,m)]}}</math>
  
Free energy minimisation formalises the notion of unconscious inference in perception
 
  
 
自由能最小化使知觉中的无意识推理的概念正规化
 
自由能最小化使知觉中的无意识推理的概念正规化
  
Models with minimum free energy provide an accurate explanation of data, under complexity costs (c.f., [[Occam's razor]] and more formal treatments of computational costs<ref>Ortega, P. A., & Braun, D. A. (2012). [http://rspa.royalsocietypublishing.org/content/469/2153/20120683 Thermodynamics as a theory of decision-making with information processing costs].  Proceedings of the Royal Society A, vol. 469, no. 2153 (20120683) .</ref>). Here, complexity is the divergence between the variational density and prior beliefs about hidden states (i.e., the effective degrees of freedom used to explain the data).
 
  
 
具有最小自由能的模型提供了数据的精确解释,降低了复杂性成本(c.f.,[[奥卡姆剃刀]]和计算成本的更正式的处理方法<ref>Ortega, P. A., & Braun, D. A. (2012). [http://rspa.royalsocietypublishing.org/content/469/2153/20120683 Thermodynamics as a theory of decision-making with information processing costs].  Proceedings of the Royal Society A, vol. 469, no. 2153 (20120683) .</ref>)。这里,复杂性是变分密度和关于隐藏状态的先验信念(即用于解释数据的有效自由度)之间的差异。
 
具有最小自由能的模型提供了数据的精确解释,降低了复杂性成本(c.f.,[[奥卡姆剃刀]]和计算成本的更正式的处理方法<ref>Ortega, P. A., & Braun, D. A. (2012). [http://rspa.royalsocietypublishing.org/content/469/2153/20120683 Thermodynamics as a theory of decision-making with information processing costs].  Proceedings of the Royal Society A, vol. 469, no. 2153 (20120683) .</ref>)。这里,复杂性是变分密度和关于隐藏状态的先验信念(即用于解释数据的有效自由度)之间的差异。
第190行: 第176行:
 
=== Free energy minimisation and thermodynamics 自由能最小化与热力学===
 
=== Free energy minimisation and thermodynamics 自由能最小化与热力学===
  
Usually, the generative models that define free energy are non-linear and hierarchical (like cortical hierarchies in the brain). Special cases of generalised filtering include Kalman filtering, which is formally equivalent to predictive coding – a popular metaphor for message passing in the brain. Under hierarchical models, predictive coding involves the recurrent exchange of ascending (bottom-up) prediction errors and descending (top-down) predictions that is consistent with the anatomy and physiology of sensory and motor systems.
 
  
 
通常,定义自由能的生成模型是非线性和层次化的(就像大脑中的皮层层次结构)。广义滤波的特例包括Kalman滤波,它在形式上等同于预测编码(predictive coding)——大脑中信息传递的一个流行隐喻。在层次模型下,预测编码涉及到上升(自下而上)预测错误和下降(自上而下)预测的反复交换,这与感觉和运动系统的解剖和生理学是一致的。
 
通常,定义自由能的生成模型是非线性和层次化的(就像大脑中的皮层层次结构)。广义滤波的特例包括Kalman滤波,它在形式上等同于预测编码(predictive coding)——大脑中信息传递的一个流行隐喻。在层次模型下,预测编码涉及到上升(自下而上)预测错误和下降(自上而下)预测的反复交换,这与感觉和运动系统的解剖和生理学是一致的。
  
  
Variational free energy is an information theoretic functional and is distinct from thermodynamic (Helmholtz) [[Helmholtz free energy|free energy]].<ref>Evans, D. J. (2003). [http://rscweb.anu.edu.au/~evans/papers/NEFET.pdf A non-equilibrium free energy theorem for deterministic systems]. Molecular Physics , 101, 15551–4.</ref> However, the complexity term of variational free energy shares the same fixed point as Helmholtz free energy (under the assumption the system is thermodynamically closed but not isolated). This is because if sensory perturbations are suspended (for a suitably long period of time), complexity is minimised (because accuracy can be neglected). At this point, the system is at equilibrium and internal states minimise Helmholtz free energy, by the [[principle of minimum energy]].<ref>Jarzynski, C. (1997). [https://arxiv.org/abs/cond-mat/9610209 Nonequilibrium equality for free energy differences]. Phys. Rev. Lett., 78, 2690.</ref>
+
变分自由能是一种信息论泛函,不同于热力学(亥姆霍兹Helmholtz)[[Helmholtz自由能|自由能]]<ref>Evans, D. J. (2003). [http://rscweb.anu.edu.au/~evans/papers/NEFET.pdf A non-equilibrium free energy theorem for deterministic systems]. Molecular Physics , 101, 15551–4.</ref>然而,变分自由能的复杂性项与Helmholtz自由能具有相同的不动点(假设系统是热力学封闭而非孤立的)。这是因为如果感官干扰被暂停(一段适当长的时间),复杂性被最小化(因为准确度可以忽略)。此时,系统处于平衡状态,内部状态根据[[最小能量原理]]<ref>Jarzynski, C. (1997). [https://arxiv.org/abs/cond-mat/9610209 Nonequilibrium equality for free energy differences]. Phys. Rev. Lett., 78, 2690.</ref>使亥姆霍兹自由能最小化。
 
 
变分自由能是一种信息论泛函,不同于热力学(亥姆霍兹Helmholtz)[[Helmholtz自由能|自由能]]。<ref>Evans, D. J. (2003). [http://rscweb.anu.edu.au/~evans/papers/NEFET.pdf A non-equilibrium free energy theorem for deterministic systems]. Molecular Physics , 101, 15551–4.</ref>然而,变分自由能的复杂性项与Helmholtz自由能具有相同的不动点(假设系统是热力学封闭而非孤立的)。这是因为如果感官干扰被暂停(一段适当长的时间),复杂性被最小化(因为准确度可以忽略)。此时,系统处于平衡状态,内部状态根据[[最小能量原理]]使亥姆霍兹自由能最小化。
 
  
 
=== Free energy minimisation and information theory 自由能最小化与信息论===
 
=== Free energy minimisation and information theory 自由能最小化与信息论===
  
In predictive coding, optimising model parameters through a gradient ascent on the time integral of free energy (free action) reduces to associative or Hebbian plasticity and is associated with synaptic plasticity in the brain.
 
  
 
在预测编码中,通过自由能时间积分(自由作用)的梯度上升来优化模型参数会降低到联想或赫伯可塑性,并与大脑中的突触可塑性有关。
 
在预测编码中,通过自由能时间积分(自由作用)的梯度上升来优化模型参数会降低到联想或赫伯可塑性,并与大脑中的突触可塑性有关。
  
Free energy minimisation is equivalent to maximising the [[mutual information]] between sensory states and internal states that parameterise the variational density (for a fixed entropy variational density).<ref name="Friston" />{{Better source|date=February 2020|reason=MDPI is a questionable source}} This relates free energy minimization to the principle of minimum redundancy<ref>Barlow, H. (1961). [http://www.trin.cam.ac.uk/horacebarlow/21.pdf Possible principles underlying the transformations of sensory messages] {{Webarchive|url=https://web.archive.org/web/20120603182706/http://www.trin.cam.ac.uk/horacebarlow/21.pdf |date=2012-06-03 }}. In W. Rosenblith (Ed.), Sensory Communication (pp. 217-34). Cambridge, MA: MIT Press.</ref> and related treatments using information theory to describe optimal behaviour.<ref>Linsker, R. (1990). [https://www.annualreviews.org/doi/pdf/10.1146/annurev.ne.13.030190.001353 Perceptual neural organization: some approaches based on network models and information theory]. Annu Rev Neurosci. , 13, 257–81.</ref><ref>Bialek, W., Nemenman, I., & Tishby, N. (2001). [http://www.princeton.edu/~wbialek/our_papers/bnt_01a.pdf Predictability, complexity, and learning]. Neural Computat., 13 (11), 2409–63.</ref>
 
  
 
自由能最小化相当于最大化感官状态和内部状态之间的[[互信息]],使变分密度参数化(对于固定熵变分密度)<ref name="Friston" />{{Better source|date=February 2020|reason=MDPI is a questionable source}}这将自由能最小化与最小冗余原则联系起来。<ref>Barlow, H. (1961). [http://www.trin.cam.ac.uk/horacebarlow/21.pdf Possible principles underlying the transformations of sensory messages] {{Webarchive|url=https://web.archive.org/web/20120603182706/http://www.trin.cam.ac.uk/horacebarlow/21.pdf |date=2012-06-03 }}. In W. Rosenblith (Ed.), Sensory Communication (pp. 217-34). Cambridge, MA: MIT Press.</ref>并且联系到用信息论描述最优行为的相关处理<ref>Linsker, R. (1990).[https://www.annualreviews.org/doi/pdf/10.1146/annurev.ne.13.030190.001353 Perceptual neural organization: some approaches based on network models and information theory]. Annu Rev Neurosci. , 13, 257–81.</ref><ref>Bialek, W., Nemenman, I., & Tishby, N. (2001). [http://www.princeton.edu/~wbialek/our_papers/bnt_01a.pdf Predictability, complexity, and learning]. Neural Computat., 13 (11), 2409–63.</ref>
 
自由能最小化相当于最大化感官状态和内部状态之间的[[互信息]],使变分密度参数化(对于固定熵变分密度)<ref name="Friston" />{{Better source|date=February 2020|reason=MDPI is a questionable source}}这将自由能最小化与最小冗余原则联系起来。<ref>Barlow, H. (1961). [http://www.trin.cam.ac.uk/horacebarlow/21.pdf Possible principles underlying the transformations of sensory messages] {{Webarchive|url=https://web.archive.org/web/20120603182706/http://www.trin.cam.ac.uk/horacebarlow/21.pdf |date=2012-06-03 }}. In W. Rosenblith (Ed.), Sensory Communication (pp. 217-34). Cambridge, MA: MIT Press.</ref>并且联系到用信息论描述最优行为的相关处理<ref>Linsker, R. (1990).[https://www.annualreviews.org/doi/pdf/10.1146/annurev.ne.13.030190.001353 Perceptual neural organization: some approaches based on network models and information theory]. Annu Rev Neurosci. , 13, 257–81.</ref><ref>Bialek, W., Nemenman, I., & Tishby, N. (2001). [http://www.princeton.edu/~wbialek/our_papers/bnt_01a.pdf Predictability, complexity, and learning]. Neural Computat., 13 (11), 2409–63.</ref>
  
 
 
Optimizing the precision parameters corresponds to optimizing the gain of prediction errors (c.f., Kalman gain). In neuronally plausible implementations of predictive coding,
 
  
 
优化精度参数对应于优化预测误差的增益(c.f.,Kalman增益)。在预测编码的神经元似是而非的实现中,
 
优化精度参数对应于优化预测误差的增益(c.f.,Kalman增益)。在预测编码的神经元似是而非的实现中,
第217行: 第195行:
 
== Free energy minimisation in neuroscience 神经科学中的自由能最小化==
 
== Free energy minimisation in neuroscience 神经科学中的自由能最小化==
  
 
 
Simulation of the results achieved from a selective attention task carried out by the Bayesian reformulation of the SAIM entitled PE-SAIM in multiple objects environment. The graphs show the time course of the activation for the FOA and the two template units in the Knowledge Network.
 
  
 
在多目标环境下,通过贝叶斯重构的 SAIM 算法对选择性注意任务的结果进行了仿真。这些图表显示了知识网络中 FOA 和两个模板单元的激活时间过程。
 
在多目标环境下,通过贝叶斯重构的 SAIM 算法对选择性注意任务的结果进行了仿真。这些图表显示了知识网络中 FOA 和两个模板单元的激活时间过程。
  
Free energy minimisation provides a useful way to formulate normative (Bayes optimal) models of neuronal inference and learning under uncertainty<ref>Friston, K. (2010). [http://www.fil.ion.ucl.ac.uk/~karl/The%20free-energy%20principle%20A%20unified%20brain%20theory.pdf The free-energy principle: a unified brain theory?] Nat Rev Neurosci. , 11 (2), 127–38.</ref> and therefore subscribes to the [[Bayesian brain]] hypothesis.<ref>Knill, D. C., & Pouget, A. (2004). [http://mrl.isr.uc.pt/pub/bscw.cgi/d27540/ReviewKnillPouget2.pdf The Bayesian brain: the role of uncertainty in neural coding and computation]. Trends Neurosci. , 27 (12), 712–9.</ref> The neuronal processes described by free energy minimisation depend on the nature of hidden states: <math> \Psi = X \times \Theta \times \Pi </math> that can comprise time-dependent variables, time-invariant parameters and the precision (inverse variance or temperature) of random fluctuations. Minimising variables, parameters, and precision correspond to inference, learning, and the encoding of uncertainty, respectively.
 
  
 
自由能最小化为在不确定性条件下建立神经元推理和学习的规范(Bayes最优)模型提供了一种有效的方法<ref>Friston, K. (2010). [http://www.fil.ion.ucl.ac.uk/~karl/The%20free-energy%20principle%20A%20unified%20brain%20theory.pdf The free-energy principle: a unified brain theory?] Nat Rev Neurosci. , 11 (2), 127–38.</ref> 因此符合[[贝叶斯脑]]假说<ref>Knill, D. C., & Pouget, A. (2004). [http://mrl.isr.uc.pt/pub/bscw.cgi/d27540/ReviewKnillPouget2.pdf The Bayesian brain: the role of uncertainty in neural coding and computation]. Trends Neurosci. , 27 (12), 712–9.</ref>。由自由能最小化描述的神经元过程取决于隐藏状态的性质:<math>\Psi=X\times\Theta\times\Pi</math>,它可以包括时间相关变量、时不变参数和随机波动的精度(逆方差或温度)。最小化变量、参数和精度分别对应于推理、学习和不确定性编码。
 
自由能最小化为在不确定性条件下建立神经元推理和学习的规范(Bayes最优)模型提供了一种有效的方法<ref>Friston, K. (2010). [http://www.fil.ion.ucl.ac.uk/~karl/The%20free-energy%20principle%20A%20unified%20brain%20theory.pdf The free-energy principle: a unified brain theory?] Nat Rev Neurosci. , 11 (2), 127–38.</ref> 因此符合[[贝叶斯脑]]假说<ref>Knill, D. C., & Pouget, A. (2004). [http://mrl.isr.uc.pt/pub/bscw.cgi/d27540/ReviewKnillPouget2.pdf The Bayesian brain: the role of uncertainty in neural coding and computation]. Trends Neurosci. , 27 (12), 712–9.</ref>。由自由能最小化描述的神经元过程取决于隐藏状态的性质:<math>\Psi=X\times\Theta\times\Pi</math>,它可以包括时间相关变量、时不变参数和随机波动的精度(逆方差或温度)。最小化变量、参数和精度分别对应于推理、学习和不确定性编码。
  
Concerning the top-down vs bottom-up controversy that has been addressed as a major open problem of attention, a computational model has succeeded in illustrating the circulatory nature of reciprocation between top-down and bottom-up mechanisms. Using an established emergent model of attention, namely, SAIM, the authors suggested a model called PE-SAIM that in contrast to the standard version approaches the selective attention from a top-down stance. The model takes into account the forwarding prediction errors sent to the same level or a level above to minimize the energy function indicating the difference between data and its cause or in other words between the generative model and posterior. To enhance validity, they also incorporated the neural competition between the stimuli in their model. A notable feature of this model is the reformulation of the free energy function only in terms of prediction errors during the task performance.
 
  
 
关于自上而下与自下而上的争论,已经被作为一个主要的开放性的注意问题,一个计算模型已经成功地说明了自上而下和自下而上机制之间的往复循环性质。利用已建立的注意涌现模型SAIM,作者提出了一个称为PE-SAIM的模型,与标准模型相比,该模型从自上而下的角度来处理选择性注意。该模型考虑了发送到同一级别或更高级别的转发预测误差,以最小化表示数据及其原因之间的差异的能量函数,换句话说,生成模型和后验模型之间的差异。为了提高有效性,他们还在模型中加入了刺激物之间的神经竞争。该模型的一个显著特点是仅根据任务执行过程中的预测误差来重新构造自由能函数。
 
关于自上而下与自下而上的争论,已经被作为一个主要的开放性的注意问题,一个计算模型已经成功地说明了自上而下和自下而上机制之间的往复循环性质。利用已建立的注意涌现模型SAIM,作者提出了一个称为PE-SAIM的模型,与标准模型相比,该模型从自上而下的角度来处理选择性注意。该模型考虑了发送到同一级别或更高级别的转发预测误差,以最小化表示数据及其原因之间的差异的能量函数,换句话说,生成模型和后验模型之间的差异。为了提高有效性,他们还在模型中加入了刺激物之间的神经竞争。该模型的一个显著特点是仅根据任务执行过程中的预测误差来重新构造自由能函数。
第240行: 第213行:
 
(y ^ { VP } ,x ^ { SN } ,x ^ { CN } ,y ^ { KN }){ partial y ^ { SN }{ mn }}} = x ^ { CN }-b ^ { CN } varepsilon ^ { nm } + b ^ { CN } sum { k }(varepsilon ^ { KN }{ m }) </knmath >  
 
(y ^ { VP } ,x ^ { SN } ,x ^ { CN } ,y ^ { KN }){ partial y ^ { SN }{ mn }}} = x ^ { CN }-b ^ { CN } varepsilon ^ { nm } + b ^ { CN } sum { k }(varepsilon ^ { KN }{ m }) </knmath >  
  
Free energy minimisation formalises the notion of [[unconscious inference]] in perception<ref name="Helmholtz" /><ref name="Dayan" /> and provides a normative (Bayesian) theory of neuronal processing. The associated process theory of neuronal dynamics is based on minimising free energy through gradient descent. This corresponds to [[Generalized filtering|generalised Bayesian filtering]] (where ~ denotes a variable in generalised coordinates of motion and  <math>D</math> is a derivative matrix operator):<ref>Friston, K., Stephan, K., Li, B., & Daunizeau, J. (2010). [http://www.fil.ion.ucl.ac.uk/~karl/Generalised%20Filtering.pdf Generalised Filtering]. Mathematical Problems in Engineering, vol., 2010, 621670</ref>
 
  
 
自由能最小化使知觉中的[[无意识推理]]概念正式化<ref name="Helmholtz" /><ref name="Dayan" />并提供了神经元处理的规范(贝叶斯)理论。神经元动力学的相关过程理论是基于通过梯度下降最小化自由能。这对应于[[广义滤波|广义贝叶斯滤波]](其中~表示广义运动坐标中的变量,<math>D</math>是一个导数矩阵运算符):<ref>Friston, K., Stephan, K., Li, B., & Daunizeau, J. (2010). [http://www.fil.ion.ucl.ac.uk/~karl/Generalised%20Filtering.pdf Generalised Filtering]. Mathematical Problems in Engineering, vol., 2010, 621670</ref>
 
自由能最小化使知觉中的[[无意识推理]]概念正式化<ref name="Helmholtz" /><ref name="Dayan" />并提供了神经元处理的规范(贝叶斯)理论。神经元动力学的相关过程理论是基于通过梯度下降最小化自由能。这对应于[[广义滤波|广义贝叶斯滤波]](其中~表示广义运动坐标中的变量,<math>D</math>是一个导数矩阵运算符):<ref>Friston, K., Stephan, K., Li, B., & Daunizeau, J. (2010). [http://www.fil.ion.ucl.ac.uk/~karl/Generalised%20Filtering.pdf Generalised Filtering]. Mathematical Problems in Engineering, vol., 2010, 621670</ref>
  
where, <math>E^{total}</math> is the total energy function of the neural networks entail, and <math>\varepsilon^{KN}_{knm}</math> is the prediction error between the generative model (prior) and posterior changing over time.)
 
  
 
其中,<math>E^{total}</math>是神经网络的总能量函数,而 <math>\varepsilon^{KN}_{knm}</math>是生成模型前和后随时间变化的预测误差。
 
其中,<math>E^{total}</math>是神经网络的总能量函数,而 <math>\varepsilon^{KN}_{knm}</math>是生成模型前和后随时间变化的预测误差。
第250行: 第221行:
 
: <math>\dot{\tilde{\mu}} = D \tilde{\mu} - \partial_{\mu}F(s,\mu)\Big|_{\mu = \tilde{\mu}}</math>
 
: <math>\dot{\tilde{\mu}} = D \tilde{\mu} - \partial_{\mu}F(s,\mu)\Big|_{\mu = \tilde{\mu}}</math>
  
Comparing the two models reveals a notable similarity between their results while pointing out to a remarkable discrepancy, in that, in the standard version of the SAIM, the model's focus is mainly upon the excitatory connections whereas in the PE-SAIM the inhibitory connections will be leveraged to make an inference. The model has also proved to be fit to predict the EEG and fMRI data drawn from human experiments with a high precision.
 
  
 
比较这两个模型的结果发现他们之间有显著的相似性,同时指出了一个显著的差异,即在SAIM的标准版本中,模型的重点主要是兴奋性连接,而在PE-SAIM中,抑制性连接将被用来进行推断。该模型对人体实验的脑电和功能磁共振数据具有较高的预测精度。
 
比较这两个模型的结果发现他们之间有显著的相似性,同时指出了一个显著的差异,即在SAIM的标准版本中,模型的重点主要是兴奋性连接,而在PE-SAIM中,抑制性连接将被用来进行推断。该模型对人体实验的脑电和功能磁共振数据具有较高的预测精度。
  
Usually, the generative models that define free energy are non-linear and hierarchical (like cortical hierarchies in the brain). Special cases of generalised filtering include [[Kalman filter]]ing, which is formally equivalent to [[predictive coding]]<ref>Rao, R. P., & Ballard, D. H. (1999). [https://www.cs.utexas.edu/users/dana/nn.pdf Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects]. Nat Neurosci. , 2 (1), 79–87.</ref> – a popular metaphor for message passing in the brain. Under hierarchical models, predictive coding involves the recurrent exchange of ascending (bottom-up) prediction errors and descending (top-down) predictions<ref name="Mumford">Mumford, D. (1992). [http://cs.brown.edu/people/tld/projects/cortex/course/suggested_reading_list/supplements/documents/MumfordBC-92.pdf On the computational architecture of the neocortex]. II. Biol. Cybern. , 66, 241–51.</ref> that is consistent with the anatomy and physiology of sensory<ref>Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston, K. J. (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Canonical%20Microcircuits%20for%20Predictive%20Coding.pdf Canonical microcircuits for predictive coding]. Neuron , 76 (4), 695–711.</ref> and motor systems.<ref>Adams, R. A., Shipp, S., & Friston, K. J. (2013). [http://www.fil.ion.ucl.ac.uk/~karl/Predictions%20not%20commands%20-%20active%20inference%20in%20the%20motor%20system.pdf Predictions not commands: active inference in the motor system]. Brain Struct Funct. , 218 (3), 611–43</ref>
 
  
 
通常,定义自由能的生成模型是非线性和层次结构的(就像大脑中的皮层层次结构)。广义滤波的特殊情况包括[[Kalman filter]]ing,它在形式上等价于[预测编码]]<ref>Rao, R. P., & Ballard, D. H. (1999). [https://www.cs.utexas.edu/users/dana/nn.pdf Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects]. Nat Neurosci. , 2 (1), 79–87.</ref> 一种关于大脑中信息传递的流行隐喻。在分层模型下,预测编码涉及到上升(自下而上)预测错误和下降(自上而下)预测的循环交换<ref name="Mumford">Mumford, D. (1992). [http://cs.brown.edu/people/tld/projects/cortex/course/suggested_reading_list/supplements/documents/MumfordBC-92.pdf On the computational architecture of the neocortex]. II. Biol. Cybern. , 66, 241–51.</ref>这与感觉器官的解剖学和生理学<ref>Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston, K. J. (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Canonical%20Microcircuits%20for%20Predictive%20Coding.pdf Canonical microcircuits for predictive coding]. Neuron , 76 (4), 695–711.</ref>以及动力系统<ref>Adams, R. A., Shipp, S., & Friston, K. J. (2013). [http://www.fil.ion.ucl.ac.uk/~karl/Predictions%20not%20commands%20-%20active%20inference%20in%20the%20motor%20system.pdf Predictions not commands: active inference in the motor system]. Brain Struct Funct. , 218 (3), 611–43</ref>是一致的。
 
通常,定义自由能的生成模型是非线性和层次结构的(就像大脑中的皮层层次结构)。广义滤波的特殊情况包括[[Kalman filter]]ing,它在形式上等价于[预测编码]]<ref>Rao, R. P., & Ballard, D. H. (1999). [https://www.cs.utexas.edu/users/dana/nn.pdf Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects]. Nat Neurosci. , 2 (1), 79–87.</ref> 一种关于大脑中信息传递的流行隐喻。在分层模型下,预测编码涉及到上升(自下而上)预测错误和下降(自上而下)预测的循环交换<ref name="Mumford">Mumford, D. (1992). [http://cs.brown.edu/people/tld/projects/cortex/course/suggested_reading_list/supplements/documents/MumfordBC-92.pdf On the computational architecture of the neocortex]. II. Biol. Cybern. , 66, 241–51.</ref>这与感觉器官的解剖学和生理学<ref>Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston, K. J. (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Canonical%20Microcircuits%20for%20Predictive%20Coding.pdf Canonical microcircuits for predictive coding]. Neuron , 76 (4), 695–711.</ref>以及动力系统<ref>Adams, R. A., Shipp, S., & Friston, K. J. (2013). [http://www.fil.ion.ucl.ac.uk/~karl/Predictions%20not%20commands%20-%20active%20inference%20in%20the%20motor%20system.pdf Predictions not commands: active inference in the motor system]. Brain Struct Funct. , 218 (3), 611–43</ref>是一致的。
第260行: 第229行:
 
=== Perceptual learning and memory 知觉学习与记忆===
 
=== Perceptual learning and memory 知觉学习与记忆===
  
When gradient descent is applied to action <math> \dot{a} = -\partial_aF(s,\tilde{\mu}) </math>, motor control can be understood in terms of classical reflex arcs that are engaged by descending (corticospinal) predictions. This provides a formalism that generalizes the equilibrium point solution – to the degrees of freedom problem – to movement trajectories.
 
  
 
当梯度下降应用于动作<math>\dot{a}=-\partial\u aF(s,\tilde{\mu})</math>时,运动控制可以理解为通过下降(皮质脊髓)预测参与的经典反射弧。这提供了一种形式主义,将平衡点解推广——到自由度问题——到运动轨迹。
 
当梯度下降应用于动作<math>\dot{a}=-\partial\u aF(s,\tilde{\mu})</math>时,运动控制可以理解为通过下降(皮质脊髓)预测参与的经典反射弧。这提供了一种形式主义,将平衡点解推广——到自由度问题——到运动轨迹。
  
In predictive coding, optimising model parameters through a gradient ascent on the time integral of free energy (free action) reduces to associative or [[Hebbian theory|Hebbian plasticity]] and is associated with [[synaptic plasticity]] in the brain.
 
  
 
在预测编码中,通过自由能(自由作用)时间积分的梯度上升来优化模型参数会降低到联想或[[Hebbian理论| Hebbian可塑性]],并与大脑中的[[synaptic可塑性]]相关。
 
在预测编码中,通过自由能(自由作用)时间积分的梯度上升来优化模型参数会降低到联想或[[Hebbian理论| Hebbian可塑性]],并与大脑中的[[synaptic可塑性]]相关。
第270行: 第237行:
 
=== Perceptual precision, attention and salience 知觉的精确性、注意力和显著性===
 
=== Perceptual precision, attention and salience 知觉的精确性、注意力和显著性===
  
Active inference is related to optimal control by replacing value or cost-to-go functions with prior beliefs about state transitions or flow. This exploits the close connection between Bayesian filtering and the solution to the Bellman equation. However, active inference starts with (priors over) flow <math> f = \Gamma \cdot \nabla V + \nabla \times W </math> that are specified with scalar <math> V(x) </math>  and vector <math> W(x) </math> value functions of state space (c.f., the Helmholtz decomposition).  Here, <math> \Gamma </math> is the amplitude of random fluctuations and cost is <math> c(x) = f \cdot \nabla V + \nabla \cdot \Gamma \cdot V</math>.  The priors over flow <math> p(\tilde{x}\mid m) </math> induce a prior over states <math> p(x\mid m) = \exp (V(x)) </math> that is the solution to the appropriate forward Kolmogorov equations. In contrast, optimal control optimises the flow, given a cost function, under the assumption that <math> W = 0 </math> (i.e., the flow is curl free or has detailed balance). Usually, this entails solving backward Kolmogorov equations.
 
  
 
主动推理与最优控制相关,通过用关于状态转换或流的先验信念替换值函数或外推成本函数。这利用了贝叶斯过滤和贝尔曼方程的解决方案之间的紧密联系。然而,主动推理是从状态空间的向量 < math > w (x) </math > 和向量 < math > w (x) </math > 值函数(c.f,亥姆霍兹分解)开始的。这里,< math > Gamma </math > 是随机波动的振幅,成本是 < math > c (x) = f cdot nabla v + nabla cdot cdot v </math > 。P (tilde { x } mid m) </math > > p (mid m) </math > > p (mid m) = exp (v (x)) </math > 这是适当的前向 Kolmogorov 方程的解。相比之下,给定一个成本函数,在假设 < math > w = 0 </math > (即,流是无卷曲的或有详细的平衡)的情况下,最优控制使流量最优化。通常,这需要求解向后的 Kolmogorov 方程。
 
主动推理与最优控制相关,通过用关于状态转换或流的先验信念替换值函数或外推成本函数。这利用了贝叶斯过滤和贝尔曼方程的解决方案之间的紧密联系。然而,主动推理是从状态空间的向量 < math > w (x) </math > 和向量 < math > w (x) </math > 值函数(c.f,亥姆霍兹分解)开始的。这里,< math > Gamma </math > 是随机波动的振幅,成本是 < math > c (x) = f cdot nabla v + nabla cdot cdot v </math > 。P (tilde { x } mid m) </math > > p (mid m) </math > > p (mid m) = exp (v (x)) </math > 这是适当的前向 Kolmogorov 方程的解。相比之下,给定一个成本函数,在假设 < math > w = 0 </math > (即,流是无卷曲的或有详细的平衡)的情况下,最优控制使流量最优化。通常,这需要求解向后的 Kolmogorov 方程。
  
 
 
Optimizing the precision parameters corresponds to optimizing the gain of prediction errors (c.f., Kalman gain). In neuronally plausible implementations of predictive coding,<ref name="Mumford" /> this corresponds to optimizing the excitability of superficial pyramidal cells and has been interpreted in terms of attentional gain.<ref name="Feldman">Feldman, H., & Friston, K. J. (2010). [http://www.fil.ion.ucl.ac.uk/~karl/Attention%20uncertainty%20and%20free-energy.pdf Attention, uncertainty, and free-energy]. Frontiers in Human Neuroscience, 4, 215.</ref>
 
  
 
优化精度参数对应于优化预测误差的增益(c.f.,Kalman增益)。在预测性编码的神经元似是而非的实现中,<ref name="Mumford" />这对应于优化浅表锥体细胞的兴奋性,并被解释为注意增益。<ref name="Feldman">Feldman, H., & Friston, K. J. (2010). [http://www.fil.ion.ucl.ac.uk/~karl/Attention%20uncertainty%20and%20free-energy.pdf Attention, uncertainty, and free-energy]. Frontiers in Human Neuroscience, 4, 215.</ref>
 
优化精度参数对应于优化预测误差的增益(c.f.,Kalman增益)。在预测性编码的神经元似是而非的实现中,<ref name="Mumford" />这对应于优化浅表锥体细胞的兴奋性,并被解释为注意增益。<ref name="Feldman">Feldman, H., & Friston, K. J. (2010). [http://www.fil.ion.ucl.ac.uk/~karl/Attention%20uncertainty%20and%20free-energy.pdf Attention, uncertainty, and free-energy]. Frontiers in Human Neuroscience, 4, 215.</ref>
  
 
[[File:PESAIM.jpg|thumb|Simulation of the results achieved from a selective attention task carried out by the Bayesian reformulation of the SAIM entitled PE-SAIM in multiple objects environment. The graphs show the time course of the activation for the FOA and the two template units in the Knowledge Network.]]
 
  
 
[[文件:PESAIM.jpg|在多目标环境下,通过对名为PE-SAIM的SAIM进行贝叶斯重构,模拟选择性注意任务的结果。图表显示了知识网络中FOA和两个模板单元激活的时间过程。]]
 
[[文件:PESAIM.jpg|在多目标环境下,通过对名为PE-SAIM的SAIM进行贝叶斯重构,模拟选择性注意任务的结果。图表显示了知识网络中FOA和两个模板单元激活的时间过程。]]
  
Optimal decision problems (usually formulated as partially observable Markov decision processes) are treated within active inference by absorbing  utility functions into prior beliefs. In this setting, states that have a high utility (low cost) are states an agent expects to occupy. By equipping the generative model with hidden states that model control, policies (control sequences) that minimise variational free energy lead to high utility states.
 
  
 
最优决策问题(通常表示为部分可观测的马尔可夫决策过程)在主动推理中通过吸收效用函数到先验信念来处理。在此设置中,具有高效用(低成本)的状态是代理期望占据的状态。通过给生成模型装备隐藏状态,模型控制,政策(控制序列) ,最小化变化的自由能,导致高效用状态。
 
最优决策问题(通常表示为部分可观测的马尔可夫决策过程)在主动推理中通过吸收效用函数到先验信念来处理。在此设置中,具有高效用(低成本)的状态是代理期望占据的状态。通过给生成模型装备隐藏状态,模型控制,政策(控制序列) ,最小化变化的自由能,导致高效用状态。
  
 
 
Concerning the top-down vs bottom-up controversy that has been addressed as a major open problem of attention, a computational model has succeeded in illustrating the circulatory nature of reciprocation between top-down and bottom-up mechanisms. Using an established emergent model of attention, namely, SAIM, the authors suggested a model called PE-SAIM that in contrast to the standard version approaches the selective attention from a top-down stance. The model takes into account the forwarding prediction errors sent to the same level or a level above to minimize the energy function indicating the difference between data and its cause or in other words between the generative model and posterior. To enhance validity, they also incorporated the neural competition between the stimuli in their model. A notable feature of this model is the reformulation of the free energy function only in terms of prediction errors during the task performance.
 
  
 
关于自上而下与自下而上的争论,已经被作为一个主要的开放性问题的注意,一个计算模型已经成功地说明了自上而下和自下而上机制之间的往复循环性质。利用已建立的注意涌现模型SAIM,作者提出了一个称为PE-SAIM的模型,与标准模型相比,该模型从自上而下的立场接近选择性注意。该模型考虑了发送到同一级别或更高级别的转发预测误差,以最小化表示数据及其原因之间的差异的能量函数,换句话说,生成模型和后验模型之间的差异。为了提高有效性,他们还在模型中加入了刺激物之间的神经竞争。该模型的一个显著特点是仅根据任务执行过程中的预测误差来重新构造自由能函数。
 
关于自上而下与自下而上的争论,已经被作为一个主要的开放性问题的注意,一个计算模型已经成功地说明了自上而下和自下而上机制之间的往复循环性质。利用已建立的注意涌现模型SAIM,作者提出了一个称为PE-SAIM的模型,与标准模型相比,该模型从自上而下的立场接近选择性注意。该模型考虑了发送到同一级别或更高级别的转发预测误差,以最小化表示数据及其原因之间的差异的能量函数,换句话说,生成模型和后验模型之间的差异。为了提高有效性,他们还在模型中加入了刺激物之间的神经竞争。该模型的一个显著特点是仅根据任务执行过程中的预测误差来重新构造自由能函数。
  
Neurobiologically, neuromodulators like dopamine are considered to report the precision of prediction errors by modulating the gain of principal cells encoding prediction error. This is closely related to – but formally distinct from – the role of dopamine in reporting prediction errors per se and related computational accounts.
 
  
 
神经生物学认为,多巴胺等神经第质通过调节主细胞编码预测错误的增益来报告预测错误的准确性。这与多巴胺在报告预测错误本身和相关计算机账户中的作用密切相关,但在形式上有所不同。
 
神经生物学认为,多巴胺等神经第质通过调节主细胞编码预测错误的增益来报告预测错误的准确性。这与多巴胺在报告预测错误本身和相关计算机账户中的作用密切相关,但在形式上有所不同。
第303行: 第259行:
 
<math>\dfrac{\partial E^{total}(Y^{VP},X^{SN},x^{CN},y^{KN})}{\partial y^{SN}_{mn}}=x^{CN}_{mn}-b^{CN}\varepsilon^{CN}_{nm}+b^{CN}\sum_{k}(\varepsilon^{KN}_{knm})</math>
 
<math>\dfrac{\partial E^{total}(Y^{VP},X^{SN},x^{CN},y^{KN})}{\partial y^{SN}_{mn}}=x^{CN}_{mn}-b^{CN}\varepsilon^{CN}_{nm}+b^{CN}\sum_{k}(\varepsilon^{KN}_{knm})</math>
  
 
 
where, <math>E^{total}</math> is the total [[energy function]] of the neural networks entail, and <math>\varepsilon^{KN}_{knm}</math> is the prediction error between the generative model (prior) and posterior changing over time.<ref name="Abadi">Abadi K.A., Yahya K., Amini M., Heinke D. & Friston, K. J. (2019). [https://royalsocietypublishing.org/doi/full/10.1098/rsif.2018.0344 Excitatory versus inhibitory feedback in Bayesian formulations of scene construction]. 16 R. Soc. Interface</ref>)
 
  
 
其中,<math>E^{total}</math>是神经网络的总[[能量函数]],<math>\varepsilon^{KN}_{knm}</math>是生成模型(先验)和后验随时间变化的预测误差。<ref name="Abadi">Abadi K.A., Yahya K., Amini M., Heinke D. & Friston, K. J. (2019). [https://royalsocietypublishing.org/doi/full/10.1098/rsif.2018.0344 Excitatory versus inhibitory feedback in Bayesian formulations of scene construction]. 16 R. Soc. Interface</ref>)
 
其中,<math>E^{total}</math>是神经网络的总[[能量函数]],<math>\varepsilon^{KN}_{knm}</math>是生成模型(先验)和后验随时间变化的预测误差。<ref name="Abadi">Abadi K.A., Yahya K., Amini M., Heinke D. & Friston, K. J. (2019). [https://royalsocietypublishing.org/doi/full/10.1098/rsif.2018.0344 Excitatory versus inhibitory feedback in Bayesian formulations of scene construction]. 16 R. Soc. Interface</ref>)
  
Active inference has been used to address a range of issues in cognitive neuroscience, brain function and neuropsychiatry, including: action observation, mirror neurons, saccades and visual search, eye movements, sleep, illusions, attention, hysteria and psychosis. Explanations of action in active inference often depend on the idea that the brain has 'stubborn predictions' which it cannot update, leading to actions that cause these predictions to come true.
 
  
 
主动推理已经被用来解决一系列的问题,包括认知神经科学,大脑功能和神经精神病学,包括: 行为观察,镜像神经元,扫视和视觉搜索,眼球运动,睡眠,幻觉,注意力,歇斯底里和精神病。对主动推理中行为的解释往往依赖于这样一种观点,即大脑具有无法更新的“顽固预测” ,导致这些预测成为现实的行为。
 
主动推理已经被用来解决一系列的问题,包括认知神经科学,大脑功能和神经精神病学,包括: 行为观察,镜像神经元,扫视和视觉搜索,眼球运动,睡眠,幻觉,注意力,歇斯底里和精神病。对主动推理中行为的解释往往依赖于这样一种观点,即大脑具有无法更新的“顽固预测” ,导致这些预测成为现实的行为。
  
Comparing the two models reveals a notable similarity between their results while pointing out to a remarkable discrepancy, in that, in the standard version of the SAIM, the model's focus is mainly upon the excitatory connections whereas in the PE-SAIM the inhibitory connections will be leveraged to make an inference. The model has also proved to be fit to predict the EEG and fMRI data drawn from human experiments with a high precision.
 
  
 
比较这两个模型的结果发现他们的结果之间有显著的相似性,同时指出了一个显著的差异,即在SAIM的标准版本中,模型的重点主要是兴奋性连接,而在PE-SAIM中,抑制性连接将被用来进行推断。该模型对人体实验的脑电和功能磁共振数据具有较高的预测精度。
 
比较这两个模型的结果发现他们的结果之间有显著的相似性,同时指出了一个显著的差异,即在SAIM的标准版本中,模型的重点主要是兴奋性连接,而在PE-SAIM中,抑制性连接将被用来进行推断。该模型对人体实验的脑电和功能磁共振数据具有较高的预测精度。
第319行: 第270行:
 
== Active inference 主动推理==
 
== Active inference 主动推理==
  
 
 
When gradient descent is applied to action <math> \dot{a} = -\partial_aF(s,\tilde{\mu}) </math>, motor control can be understood in terms of classical reflex arcs that are engaged by descending (corticospinal) predictions. This provides a formalism that generalizes the equilibrium point solution – to the [[degrees of freedom problem]]<ref>Feldman, A. G., & Levin, M. F. (1995). [http://e.guigon.free.fr/rsc/article/FeldmanLevin95.pdf The origin and use of positional frames of reference in motor control]. Behav Brain Sci. , 18, 723–806.</ref> – to movement trajectories.
 
  
 
当梯度下降应用于动作<math>\dot{a}=-\partial\u aF(s,\tilde{\mu})</math>时,运动控制可以理解为通过下降(皮质脊髓)预测参与的经典反射弧。这提供了一种形式主义,将平衡点解推广到[[自由度问题]]<ref>Feldman, A. G., & Levin, M. F. (1995). [http://e.guigon.free.fr/rsc/article/FeldmanLevin95.pdf The origin and use of positional frames of reference in motor control]. Behav Brain Sci. , 18, 723–806.</ref>移动轨迹。
 
当梯度下降应用于动作<math>\dot{a}=-\partial\u aF(s,\tilde{\mu})</math>时,运动控制可以理解为通过下降(皮质脊髓)预测参与的经典反射弧。这提供了一种形式主义,将平衡点解推广到[[自由度问题]]<ref>Feldman, A. G., & Levin, M. F. (1995). [http://e.guigon.free.fr/rsc/article/FeldmanLevin95.pdf The origin and use of positional frames of reference in motor control]. Behav Brain Sci. , 18, 723–806.</ref>移动轨迹。
第327行: 第275行:
 
=== Active inference and optimal control 主动推理与最优控制===
 
=== Active inference and optimal control 主动推理与最优控制===
  
 
 
Active inference is related to [[optimal control]] by replacing value or cost-to-go functions with prior beliefs about state transitions or flow.<ref>Friston, K., (2011). [http://www.fil.ion.ucl.ac.uk/~karl/What%20Is%20Optimal%20about%20Motor%20Control.pdf What is optimal about motor control?]. Neuron, 72(3), 488–98.</ref> This exploits the close connection between Bayesian filtering and the solution to the [[Bellman equation]]. However, active inference starts with (priors over) flow <math> f = \Gamma \cdot \nabla V + \nabla \times W </math> that are specified with scalar <math> V(x) </math>  and vector <math> W(x) </math> value functions of state space (c.f., the [[Helmholtz decomposition]]).  Here, <math> \Gamma </math> is the amplitude of random fluctuations and cost is <math> c(x) = f \cdot \nabla V + \nabla \cdot \Gamma \cdot V</math>.  The priors over flow <math> p(\tilde{x}\mid m) </math> induce a prior over states <math> p(x\mid m) = \exp (V(x)) </math> that is the solution to the appropriate forward [[Kolmogorov equations]].<ref>Friston, K., & Ao, P. (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Free%20Energy%20Value%20and%20Attractors.pdf Free-energy, value and attractors]. Computational and mathematical methods in medicine, 2012, 937860.</ref> In contrast, optimal control optimises the flow, given a cost function, under the assumption that <math> W = 0 </math> (i.e., the flow is curl free or has detailed balance). Usually, this entails solving backward [[Kolmogorov equations]].<ref>Kappen, H., (2005). [https://arxiv.org/abs/physics/0505066 Path integrals and symmetry breaking for optimal control theory]. Journal of Statistical Mechanics: Theory and Experiment, 11, p. P11011.</ref>
 
  
 
主动推理与[[最优控制]]有关,它用状态转移或流的先验信念替换价值或成本函数。<ref>Friston, K., (2011). [http://www.fil.ion.ucl.ac.uk/~karl/What%20Is%20Optimal%20about%20Motor%20Control.pdf What is optimal about motor control?]. Neuron, 72(3), 488–98.</ref>这充分利用了贝叶斯滤波和[[Bellman方程]]解之间的紧密联系。然而,主动推理从状态空间的标量<math>V(x)</math>和向量<math>W(x)</math>值函数(c.f.,Helmholtz分解)指定的流<math>f=\Gamma\cdot\nabla V+\nabla\times W</math>开始。这里,<math>\Gamma</math>是随机波动的幅度,成本是<math>c(x)=f\cdot\nabla V+\nabla\cdot\Gamma\cdot V</math>。流上的先验<math>p(\tilde{x}\mid m)</math>诱导了一个先验的超状态<math>p(x\mid m)=\exp(V(x))</math>这是相应的正向[[Kolmogorov方程]]的解。<ref>Friston, K., & Ao, P. (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Free%20Energy%20Value%20and%20Attractors.pdf Free-energy, value and attractors]. Computational and mathematical methods in medicine, 2012, 937860.</ref>相反,在假设<math>W=0的情况下,最优控制优化了给定成本函数的流量(即,流量没有旋度或具有详细平衡)。通常,这需要向后求解[[Kolmogorov方程]]。<ref>Kappen, H., (2005). [https://arxiv.org/abs/physics/0505066 Path integrals and symmetry breaking for optimal control theory]. Journal of Statistical Mechanics: Theory and Experiment, 11, p. P11011.</ref>
 
主动推理与[[最优控制]]有关,它用状态转移或流的先验信念替换价值或成本函数。<ref>Friston, K., (2011). [http://www.fil.ion.ucl.ac.uk/~karl/What%20Is%20Optimal%20about%20Motor%20Control.pdf What is optimal about motor control?]. Neuron, 72(3), 488–98.</ref>这充分利用了贝叶斯滤波和[[Bellman方程]]解之间的紧密联系。然而,主动推理从状态空间的标量<math>V(x)</math>和向量<math>W(x)</math>值函数(c.f.,Helmholtz分解)指定的流<math>f=\Gamma\cdot\nabla V+\nabla\times W</math>开始。这里,<math>\Gamma</math>是随机波动的幅度,成本是<math>c(x)=f\cdot\nabla V+\nabla\cdot\Gamma\cdot V</math>。流上的先验<math>p(\tilde{x}\mid m)</math>诱导了一个先验的超状态<math>p(x\mid m)=\exp(V(x))</math>这是相应的正向[[Kolmogorov方程]]的解。<ref>Friston, K., & Ao, P. (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Free%20Energy%20Value%20and%20Attractors.pdf Free-energy, value and attractors]. Computational and mathematical methods in medicine, 2012, 937860.</ref>相反,在假设<math>W=0的情况下,最优控制优化了给定成本函数的流量(即,流量没有旋度或具有详细平衡)。通常,这需要向后求解[[Kolmogorov方程]]。<ref>Kappen, H., (2005). [https://arxiv.org/abs/physics/0505066 Path integrals and symmetry breaking for optimal control theory]. Journal of Statistical Mechanics: Theory and Experiment, 11, p. P11011.</ref>
第335行: 第280行:
 
=== Active inference and optimal decision (game) theory 主动推理与最优决策(博弈)理论===
 
=== Active inference and optimal decision (game) theory 主动推理与最优决策(博弈)理论===
  
 
 
[[Optimal decision]] problems (usually formulated as [[partially observable Markov decision process]]es) are treated within active inference by absorbing [[Utility| utility functions]] into prior beliefs. In this setting, states that have a high utility (low cost) are states an agent expects to occupy. By equipping the generative model with hidden states that model control, policies (control sequences) that minimise variational free energy lead to high utility states.<ref>Friston, K., Samothrakis, S. & Montague, R., (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Active%20inference%20and%20agency%20optimal%20control%20without%20cost%20functions.pdf Active inference and agency: optimal control without cost functions]. Biol. Cybernetics, 106(8–9), 523–41.</ref>
 
  
 
[[最优决策]]问题(通常表示为[[部分可观测马尔可夫决策过程]]es)通过将[[效用|效用函数]]吸收到先验信念中,在主动推理中处理。在此设置中,具有高效用(低成本)的状态是代理希望占用的状态。通过给生成模型配备模型控制的隐藏状态,最小化可变自由能的策略(控制序列)会导致高效用状态。 <ref>Friston, K., Samothrakis, S. & Montague, R., (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Active%20inference%20and%20agency%20optimal%20control%20without%20cost%20functions.pdf Active inference and agency: optimal control without cost functions]. Biol. Cybernetics, 106(8–9), 523–41.</ref>
 
[[最优决策]]问题(通常表示为[[部分可观测马尔可夫决策过程]]es)通过将[[效用|效用函数]]吸收到先验信念中,在主动推理中处理。在此设置中,具有高效用(低成本)的状态是代理希望占用的状态。通过给生成模型配备模型控制的隐藏状态,最小化可变自由能的策略(控制序列)会导致高效用状态。 <ref>Friston, K., Samothrakis, S. & Montague, R., (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Active%20inference%20and%20agency%20optimal%20control%20without%20cost%20functions.pdf Active inference and agency: optimal control without cost functions]. Biol. Cybernetics, 106(8–9), 523–41.</ref>
  
Neurobiologically, neuromodulators like [[dopamine]] are considered to report the precision of prediction errors by modulating the gain of principal cells encoding prediction error.<ref name="Friston_a">Friston, K. J. Shiner T, FitzGerald T, Galea JM, Adams R, Brown H, Dolan RJ, Moran R, Stephan KE, Bestmann S. (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Dopamine%20Affordance%20and%20Active%20Inference.pdf Dopamine, affordance and active inference]. PLoS Comput. Biol., 8(1), p. e1002327.</ref> This is closely related to – but formally distinct from – the role of dopamine in reporting prediction errors ''per se''<ref>Fiorillo, C. D., Tobler, P. N. & Schultz, W., (2003). [http://e.guigon.free.fr/rsc/article/FiorilloEtAl03.pdf Discrete coding of reward probability and uncertainty by dopamine neurons]. Science, 299(5614), 1898–902.</ref> and related computational accounts.<ref>Frank, M. J., (2005). [http://ski.cog.brown.edu/papers/Frank_JOCN.pdf Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism]. J Cogn Neurosci., Jan, 1, 51–72.</ref>
 
  
 
神经生物学上,神经调节剂[[多巴胺]]被认为通过调节编码预测误差的主细胞的增益来报告预测误差的准确性。<ref name="Friston_a">Friston, K. J. Shiner T, FitzGerald T, Galea JM, Adams R, Brown H, Dolan RJ, Moran R, Stephan KE, Bestmann S. (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Dopamine%20Affordance%20and%20Active%20Inference.pdf Dopamine, affordance and active inference]. PLoS Comput. Biol., 8(1), p. e1002327.</ref> 这与多巴胺在报告预测错误“本身”中的作用密切相关,但在形式上与之不同<ref>Fiorillo, C. D., Tobler, P. N. & Schultz, W., (2003). [http://e.guigon.free.fr/rsc/article/FiorilloEtAl03.pdf Discrete coding of reward probability and uncertainty by dopamine neurons]. Science, 299(5614), 1898–902.</ref>以及与计算账户相关<ref>Frank, M. J., (2005). [http://ski.cog.brown.edu/papers/Frank_JOCN.pdf Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism]. J Cogn Neurosci., Jan, 1, 51–72.</ref>
 
神经生物学上,神经调节剂[[多巴胺]]被认为通过调节编码预测误差的主细胞的增益来报告预测误差的准确性。<ref name="Friston_a">Friston, K. J. Shiner T, FitzGerald T, Galea JM, Adams R, Brown H, Dolan RJ, Moran R, Stephan KE, Bestmann S. (2012). [http://www.fil.ion.ucl.ac.uk/~karl/Dopamine%20Affordance%20and%20Active%20Inference.pdf Dopamine, affordance and active inference]. PLoS Comput. Biol., 8(1), p. e1002327.</ref> 这与多巴胺在报告预测错误“本身”中的作用密切相关,但在形式上与之不同<ref>Fiorillo, C. D., Tobler, P. N. & Schultz, W., (2003). [http://e.guigon.free.fr/rsc/article/FiorilloEtAl03.pdf Discrete coding of reward probability and uncertainty by dopamine neurons]. Science, 299(5614), 1898–902.</ref>以及与计算账户相关<ref>Frank, M. J., (2005). [http://ski.cog.brown.edu/papers/Frank_JOCN.pdf Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism]. J Cogn Neurosci., Jan, 1, 51–72.</ref>

2021年2月19日 (五) 20:33的版本

{{简介{神经科学中由卡尔J.弗里斯顿提出的假说}}


自由能原理Free energy principle是一个正式的陈述,它解释了生物系统和非生物系统如何通过将自己限制在有限的几个状态而保持在 非平衡稳态Non-equilibrium steady-states[1]它表明系统最小化了内部状态的自由能函数,而内部状态包含了对环境中隐藏状态的信任。自由能的内隐最小化在形式上与 变分贝叶斯方法Variational Bayesian methods有关,最初由 Karl Friston 引入,作为神经科学中对具身知觉的解释,[2]在那里它也被称为 主动推理Active inference


自由能原理解释了一个给定系统的存在,它通过一个马尔可夫毯Markov blanket建模,试图最小化他们的世界模型和他们的感觉和相关知觉之间的差异。这种差异可以被描述为”出其不意” ,并通过不断修正系统的世界模型来减少这种差异。因此,这个原理是基于贝叶斯的观点,即大脑是一个“推理机”。弗里斯顿为最小化增加了第二条路线: 行动。通过积极地将世界改变为预期的状态,系统也可以使系统的自由能最小化。弗里斯顿认为这是所有生物反应的原理。[3]弗里斯顿还认为,他的原则即适用于精神障碍也适用于人工智能。基于主动推理原则的人工智能实现比其他方法显示出优势。[3]关于这一原则的讨论也受到批评,认为它引用的形而上学假设与可检验的科学预测相去甚远,使这一原则不可证伪。在2018年的一次采访中,弗里斯顿承认,自由能原理不能被恰当地证伪: “自由能原理就是它的本来面目ーー一个原理。就像汉密尔顿的静止作用原理一样,它不能被证伪。这是不能被推翻的。事实上,除非你问可衡量的系统是否符合这一原则,否则你用它做不了什么。”


自由能原理被批评为很难理解,甚至对专家来说也是如此。[4]对这一原则的讨论也被批评为援引了形而上学远离可检验的科学预测的假设,使这一原则成为不可证伪的。[5]在2018年的一次采访中,弗里斯顿承认自由能原则并不恰当可证伪性:“自由能原则就是它的本来面目-一个原则。与哈密顿定常作用原理一样,它是不可证伪的。这是无法反驳的。事实上,除非你问可测量系统是否符合这一原则,否则你对此无能为力。”[6]


背景

自我组织的生物系统——比如细胞或大脑——可以被理解为最小化变分自由能的概念,是基于亥姆霍兹在无意识推理[7]以及随后的心理学[8]和机器学习[9]治疗方面的工作。变分自由能是观测值及其隐含原因的概率密度的函数。这个变分密度的定义关系到一个概率模型,从假设的原因产生预测观测。在这种情况下,自由能提供了一个近似贝叶斯模型[10]的证据。因此,它的最小化可以被看作是一个贝叶斯推断过程。当一个系统积极地进行观测以最小化自由能时,它隐含地进行了积极推理并最大化其世界模型的证据。


然而,自由能也是结果自信息的一个上限,长期的平均值是熵。这意味着,如果一个系统采取行动来最小化自由能,它将隐含地放置一个熵的结果-或感官状态-它的样本上限。[11][12]模板:Better source


与其他理论的关系

主动推理与良好的调节器定理以及自组织的相关理论,如自组装、模式形成、自创生和拓扑实践密切相关。它涉及控制论、协同学和具身认知理论中所考虑的主题。由于自由能可以用变分密度下观测值的期望能量减去其熵来表示,因此它也与最大熵原理有关。最后,由于能量的时间平均值是作用量,因此最小变分自由能原理是最小作用量原理。


主动推理与好调节器定理密切相关[13]以及与自组织的内容相关,[14][15] 例如自组装模式形成自生[16]实践[17].


它解决了控制论协同学中考虑的主题[18]以及具身认知。由于自由能可以表示为变分密度下观测值的期望能量减去其熵,因此它也与最大熵原理有关。[19] 最后,由于能量的时间平均是作用量,最小变分自由能原理是一种最小作用原理


这些示意图说明了如何将状态划分为内部状态和隐藏状态或外部状态,这些状态由一个马尔可夫毯(包括感觉状态和活动状态)分隔开来。下面的面板显示了这个分区,因为它将应用于大脑中的动作和感知;活动和内部状态将感官状态的自由能功能最小化。随后内部状态的自组织与感知相对应,而动作将大脑状态与外部状态耦合。上面的面板显示完全相同的依赖性,但重新排列,使内部状态与细胞内状态相关联,而感觉状态成为细胞膜的表面状态覆盖活性状态(例如,细胞骨架的肌动蛋白丝)。

定义

Definition (continuous formulation): Active inference rests on the tuple [math]\displaystyle{ (\Omega,\Psi,S,A,R,q,p) }[/math],

定义(连续公式) : 主动推理依赖于元组[math]\displaystyle{ (\Omega,\Psi,S,A,R,q,p) }[/math] ,

These schematics illustrate the partition of states into internal and hidden or external states that are separated by a Markov blanket – comprising sensory and active states. The lower panel shows this partition as it would be applied to action and perception in the brain; where active and internal states minimise a free energy functional of sensory states. The ensuing self-organisation of internal states then correspond perception, while action couples brain states back to external states. The upper panel shows exactly the same dependencies but rearranged so that the internal states are associated with the intracellular states of a cell, while the sensory states become the surface states of the cell membrane overlying active states (e.g., the actin filaments of the cytoskeleton).

500px |右|这些示意图说明了将状态划分为内部状态和隐藏状态或外部状态,这些状态由一个马尔可夫毯(包括感觉状态和活动状态)隔开。下面的面板显示了这个分区,因为它将应用于大脑中的动作和感知;活动和内部状态将感官状态的自由能功能最小化。随后内部状态的自组织与感知相对应,而动作将大脑状态与外部状态耦合。上面的面板显示了完全相同的依赖性,但重新排列,使内部状态与细胞内状态相关联,而感觉状态则成为细胞膜上覆盖活性状态的表面状态(例如,细胞骨架的肌动蛋白丝)。

Definition (continuous formulation): Active inference rests on the tuple [math]\displaystyle{ (\Omega,\Psi,S,A,R,q,p) }[/math],

“Definition”(连续公式):主动推理基于元组[math]\displaystyle{ (\Omega,\Psi,S,A,R,q,p) }[/math]

  • A sample space [math]\displaystyle{ \Omega }[/math] – from which random fluctuations [math]\displaystyle{ \omega \in \Omega }[/math] are drawn
  • “一个样本空间”[math]\displaystyle{ \Omega }[/math]–从中提取随机波动[math]\displaystyle{ \Omega\in\Omega }[/math]
  • Hidden or external states [math]\displaystyle{ \Psi:\Psi\times A \times \Omega \to \mathbb{R} }[/math] – that cause sensory states and depend on action
  • “隐藏或外部状态”[math]\displaystyle{ \Psi:\Psi\times A\times\Omega\to\mathbb{R} }[/math]——引起感觉状态并依赖于动作
  • Sensory states [math]\displaystyle{ S:\Psi \times A \times \Omega \to \mathbb{R} }[/math] – a probabilistic mapping from action and hidden states
  • “感觉状态”[math]\displaystyle{ S:\Psi\times A\times\Omega\to\mathbb{R} }[/math]——动作和隐藏状态的概率映射
  • Action [math]\displaystyle{ A:S\times R \to \mathbb{R} }[/math] – that depends on sensory and internal states
  • “动作”[math]\displaystyle{ A:S\times R \to \mathbb{R} }[/math]——这取决于感觉和内部状态
  • Internal states [math]\displaystyle{ R:R\times S \to \mathbb{R} }[/math] – that cause action and depend on sensory states
  • “内部状态”[math]\displaystyle{ R:R\times S\to\mathbb{R} }[/math]——引起动作并依赖于感觉状态
  • Generative density [math]\displaystyle{ p(s, \psi \mid m) }[/math] – over sensory and hidden states under a generative model [math]\displaystyle{ m }[/math]
  • “生成密度”[math]\displaystyle{ p(s, \psi \mid m) }[/math]——在生成模型下的感觉和隐藏状态
  • Variational density [math]\displaystyle{ q(\psi \mid \mu) }[/math] – over hidden states [math]\displaystyle{ \psi \in \Psi }[/math] that is parameterised by internal states [math]\displaystyle{ \mu \in R }[/math]
  • “变分密度”[math]\displaystyle{ q(\psi \mid \mu) }[/math]–由R中的内部状态[math]\displaystyle{ \mu \in R }[/math]参数化的隐藏状态[math]\displaystyle{ \psi \in \Psi }[/math]

The objective is to maximise model evidence [math]\displaystyle{ p(s\mid m) }[/math] or minimise surprise [math]\displaystyle{ -\log p(s\mid m) }[/math]. This generally involves an intractable marginalisation over hidden states, so surprise is replaced with an upper variational free energy bound. This formulation rests on a Markov blanket (comprising action and sensory states) that separates internal and external states. If internal states and action minimise free energy, then they place an upper bound on the entropy of sensory states

其目的是最大限度地提高模型的证据,或者最大限度地减少惊喜。这通常涉及隐状态的棘手边缘化,因此用变分自由能上界代替惊奇。这个公式建立在一个马尔可夫毯子(包括行动和感官状态) ,分离内部和外部状态。如果内部状态和作用力使自由能最小化,那么它们在感觉状态的熵上设置了一个上限

Action and perception 行动与感知

[math]\displaystyle{  \lim_{T\to\infty} \frac{1}{T} \underset{\text{free-action}} {\underbrace{\int_0^T F(s(t),\mu (t))\,dt}}  \ge

\lt  math \gt  lim { t to infty } frac {1}{ t } underset { text { free-action }{ underbrace { int _ 0 ^ t f (s (t) ,mu (t)) ,dt } ge

The objective is to maximise model evidence \lt math\gt p(s\mid m) }[/math] or minimise surprise [math]\displaystyle{ -\log p(s\mid m) }[/math]. This generally involves an intractable marginalisation over hidden states, so surprise is replaced with an upper variational free energy bound.[9] However, this means that internal states must also minimise free energy, because free energy is a function of sensory and internal states:

目标是最大化模型证据[math]\displaystyle{ p(s\mid m) }[/math]或最小化意外[math]\displaystyle{ -\log p(s\mid m) }[/math]。这通常涉及隐藏态的难以处理的边缘化,因此意外被一个较高的变分自由能边界所取代。[9]然而,这意味着内部状态也必须最小化自由能,因为自由能是感官和内部状态的函数:

\lim_{T\to\infty} \frac{1}{T} \int_0^T \underset{\text{surprise}}{\underbrace{-\log p(s(t)\mid m)}} \, dt = H[p(s\mid m)] </math>

林 _ { t to infty } frac {1}{ t } int _ 0 ^ t underset { text { surprise }{ underbrace {-log p (s (t) mid m)}} ,dt = h [ p (s mid m)] </math >


[math]\displaystyle{ a(t) = \underset{a}{\operatorname{arg\,min}} \{ F(s(t),\mu(t)) \} }[/math]

This is because – under ergodic assumptions – the long-term average of surprise is entropy. This bound resists a natural tendency to disorder – of the sort associated with the second law of thermodynamics and the fluctuation theorem.

这是因为——在遍历假设下——意外的长期平均值是熵。这个界限阻止了一种自然的无序倾向,这种无序倾向与热力学第二定律和涨落定理有关。

[math]\displaystyle{ \mu(t) = \underset{\mu}{\operatorname{arg\,min}} \{ F(s(t),\mu)) \} }[/math]


[math]\displaystyle{ \underset{\mathrm{free-energy}} {\underbrace{F(s,\mu)}} = \underset{\mathrm{energy}} {\underbrace{ E_q[-\log p(s,\psi \mid m)]}} - \underset{\mathrm{entropy}} {\underbrace{ H[q(\psi \mid \mu)]}} = \underset{\mathrm{surprise}} {\underbrace{ -\log p(s \mid m)}} + \underset{\mathrm{divergence}} {\underbrace{ D_{\mathrm{KL}}[q(\psi \mid \mu) \parallel p(\psi \mid s,m)]}} All Bayesian inference can be cast in terms of free energy minimisation; e.g.,. When free energy is minimised with respect to internal states, the Kullback–Leibler divergence between the variational and posterior density over hidden states is minimised. This corresponds to approximate Bayesian inference – when the form of the variational density is fixed – and exact Bayesian inference otherwise. Free energy minimisation therefore provides a generic description of Bayesian inference and filtering (e.g., Kalman filtering). It is also used in Bayesian model selection, where free energy can be usefully decomposed into complexity and accuracy: 所有的贝叶斯推断都可以用自由能最小化来表达,例如,当自由能相对于内态最小化时,隐态上变分密度和后验密度之间的Kullback-Leibler散度最小化。当变分密度的形式固定时,这对应于近似贝叶斯推理,反之则对应于精确贝叶斯推理。因此,自由能最小化提供了贝叶斯推理和滤波(如Kalman滤波)的一般描述。复杂度和贝叶斯模型可以有效地分解为自由能量选择: \geq \underset{\mathrm{surprise}} {\underbrace{ -\log p(s \mid m)}} }[/math]


[math]\displaystyle{  \underset{\text{free-energy}} {\underbrace{ F(s,\mu)}} = \underset{\text{complexity}} {\underbrace{ D_\mathrm{KL}[q(\psi\mid\mu)\parallel p(\psi\mid m)]}} - \underset{\mathrm{accuracy}} {\underbrace{E_q[\log p(s\mid\psi,m)]}} }[/math]


这导致了一个双重最小化的行动和内部状态,分别对应于行动和感知。


具有最小自由能的模型在复杂度成本(c.f.,Occam's razor和更正式的计算成本处理)下提供了数据的精确解释。这里,复杂性是变分密度和关于隐藏状态的先验信念(即用于解释数据的有效自由度)之间的差异。

Free energy minimisation 自由能最小化

Free energy minimisation and self-organisation 自由能最小化和自组织

变分自由能是一种信息论泛函,不同于热力学(亥姆霍兹)自由能。然而,变分自由能的复杂性项与亥姆霍兹自由能具有相同的固定点(假设系统是热力学闭合的,而不是孤立的)。这是因为如果感觉干扰暂停(适当长的时间) ,复杂性是最小的(因为准确性可以忽略)。在这一点上,系统处于平衡状态,内部状态通过最小能量原理使亥姆霍兹自由能最小。


自由能最小化被认为是自组织系统的一个标志。[20] 这个公式建立在一个马尔可夫毯(包括行动和感觉状态)分离内部和外部状态。如果内部状态和行为使自由能最小化,那么它们就给感官状态的熵设置了一个上限。

[math]\displaystyle{ \lim_{T\to\infty} \frac{1}{T} \underset{\text{free-action}} {\underbrace{\int_0^T F(s(t),\mu (t))\,dt}} \ge \lim_{T\to\infty} \frac{1}{T} \int_0^T \underset{\text{surprise}}{\underbrace{-\log p(s(t)\mid m)}} \, dt = H[p(s\mid m)] }[/math]


自由能最小化相当于最大化感观状态和内部状态之间的互信息,使变分密度参数化(对于固定熵变分密度)。利用信息论描述最优行为的相关处理。


这是因为在遍历假设下,惊喜的长期平均值是熵。这个界限抵抗了一种自然的无序倾向,这种无序倾向与热力学第二定律涨落定理有关。

Free energy minimisation and Bayesian inference 自由能最小化与贝叶斯推理

自由能最小化为在不确定性条件下建立神经元推理和学习的规范(Bayes最优)模型提供了一种有用的方法,因此符合贝叶斯Bayesian脑假设。由自由能最小化描述的神经元过程取决于隐藏状态的性质:[math]\displaystyle{ \Psi = X \times \Theta \times \Pi }[/math],它可以包括时间相关变量、时不变参数和随机波动的精度(逆方差或温度)。最小化变量、参数和精度分别对应于推理、学习和不确定性编码。


所有的贝叶斯推断都可以用自由能最小化来表示,例如,[21]模板:验证失败当自由能相对于内部态最小化时,隐态上变分密度和后验密度之间的Kullback–Leibler散度最小化。当变分密度的形式固定时,这对应于近似的贝叶斯推理,否则对应于精确的贝叶斯推理。因此,自由能最小化提供了贝叶斯推理和滤波的一般描述(例如,Kalman filtering)。它也用于贝叶斯模型选择,其中自由能可以有效地分解为复杂性和准确性:

[math]\displaystyle{ \underset{\text{free-energy}} {\underbrace{ F(s,\mu)}} = \underset{\text{complexity}} {\underbrace{ D_\mathrm{KL}[q(\psi\mid\mu)\parallel p(\psi\mid m)]}} - \underset{\mathrm{accuracy}} {\underbrace{E_q[\log p(s\mid\psi,m)]}} }[/math]


自由能最小化使知觉中的无意识推理的概念正规化


具有最小自由能的模型提供了数据的精确解释,降低了复杂性成本(c.f.,奥卡姆剃刀和计算成本的更正式的处理方法[22])。这里,复杂性是变分密度和关于隐藏状态的先验信念(即用于解释数据的有效自由度)之间的差异。

[math]\displaystyle{ \dot{\tilde{\mu}} = D \tilde{\mu} - \partial_{\mu}F(s,\mu)\Big|_{\mu = \tilde{\mu}} }[/math]

Free energy minimisation and thermodynamics 自由能最小化与热力学

通常,定义自由能的生成模型是非线性和层次化的(就像大脑中的皮层层次结构)。广义滤波的特例包括Kalman滤波,它在形式上等同于预测编码(predictive coding)——大脑中信息传递的一个流行隐喻。在层次模型下,预测编码涉及到上升(自下而上)预测错误和下降(自上而下)预测的反复交换,这与感觉和运动系统的解剖和生理学是一致的。


变分自由能是一种信息论泛函,不同于热力学(亥姆霍兹Helmholtz)自由能[23]然而,变分自由能的复杂性项与Helmholtz自由能具有相同的不动点(假设系统是热力学封闭而非孤立的)。这是因为如果感官干扰被暂停(一段适当长的时间),复杂性被最小化(因为准确度可以忽略)。此时,系统处于平衡状态,内部状态根据最小能量原理[24]使亥姆霍兹自由能最小化。

Free energy minimisation and information theory 自由能最小化与信息论

在预测编码中,通过自由能时间积分(自由作用)的梯度上升来优化模型参数会降低到联想或赫伯可塑性,并与大脑中的突触可塑性有关。


自由能最小化相当于最大化感官状态和内部状态之间的互信息,使变分密度参数化(对于固定熵变分密度)[11]模板:Better source这将自由能最小化与最小冗余原则联系起来。[25]并且联系到用信息论描述最优行为的相关处理[26][27]


优化精度参数对应于优化预测误差的增益(c.f.,Kalman增益)。在预测编码的神经元似是而非的实现中,

Free energy minimisation in neuroscience 神经科学中的自由能最小化

在多目标环境下,通过贝叶斯重构的 SAIM 算法对选择性注意任务的结果进行了仿真。这些图表显示了知识网络中 FOA 和两个模板单元的激活时间过程。


自由能最小化为在不确定性条件下建立神经元推理和学习的规范(Bayes最优)模型提供了一种有效的方法[28] 因此符合贝叶斯脑假说[29]。由自由能最小化描述的神经元过程取决于隐藏状态的性质:[math]\displaystyle{ \Psi=X\times\Theta\times\Pi }[/math],它可以包括时间相关变量、时不变参数和随机波动的精度(逆方差或温度)。最小化变量、参数和精度分别对应于推理、学习和不确定性编码。


关于自上而下与自下而上的争论,已经被作为一个主要的开放性的注意问题,一个计算模型已经成功地说明了自上而下和自下而上机制之间的往复循环性质。利用已建立的注意涌现模型SAIM,作者提出了一个称为PE-SAIM的模型,与标准模型相比,该模型从自上而下的角度来处理选择性注意。该模型考虑了发送到同一级别或更高级别的转发预测误差,以最小化表示数据及其原因之间的差异的能量函数,换句话说,生成模型和后验模型之间的差异。为了提高有效性,他们还在模型中加入了刺激物之间的神经竞争。该模型的一个显著特点是仅根据任务执行过程中的预测误差来重新构造自由能函数。


Perceptual inference and categorisation 感性推理与分类

[math]\displaystyle{ \dfrac{\partial E^{total}(Y^{VP},X^{SN},x^{CN},y^{KN})}{\partial y^{SN}_{mn}}=x^{CN}_{mn}-b^{CN}\varepsilon^{CN}_{nm}+b^{CN}\sum_{k}(\varepsilon^{KN}_{knm}) }[/math]

(y ^ { VP } ,x ^ { SN } ,x ^ { CN } ,y ^ { KN }){ partial y ^ { SN }{ mn }}} = x ^ { CN }-b ^ { CN } varepsilon ^ { nm } + b ^ { CN } sum { k }(varepsilon ^ { KN }{ m }) </knmath >


自由能最小化使知觉中的无意识推理概念正式化[7][9]并提供了神经元处理的规范(贝叶斯)理论。神经元动力学的相关过程理论是基于通过梯度下降最小化自由能。这对应于广义贝叶斯滤波(其中~表示广义运动坐标中的变量,[math]\displaystyle{ D }[/math]是一个导数矩阵运算符):[30]


其中,[math]\displaystyle{ E^{total} }[/math]是神经网络的总能量函数,而 [math]\displaystyle{ \varepsilon^{KN}_{knm} }[/math]是生成模型前和后随时间变化的预测误差。

[math]\displaystyle{ \dot{\tilde{\mu}} = D \tilde{\mu} - \partial_{\mu}F(s,\mu)\Big|_{\mu = \tilde{\mu}} }[/math]


比较这两个模型的结果发现他们之间有显著的相似性,同时指出了一个显著的差异,即在SAIM的标准版本中,模型的重点主要是兴奋性连接,而在PE-SAIM中,抑制性连接将被用来进行推断。该模型对人体实验的脑电和功能磁共振数据具有较高的预测精度。


通常,定义自由能的生成模型是非线性和层次结构的(就像大脑中的皮层层次结构)。广义滤波的特殊情况包括Kalman filtering,它在形式上等价于[预测编码]][31] 一种关于大脑中信息传递的流行隐喻。在分层模型下,预测编码涉及到上升(自下而上)预测错误和下降(自上而下)预测的循环交换[32]这与感觉器官的解剖学和生理学[33]以及动力系统[34]是一致的。

Perceptual learning and memory 知觉学习与记忆

当梯度下降应用于动作[math]\displaystyle{ \dot{a}=-\partial\u aF(s,\tilde{\mu}) }[/math]时,运动控制可以理解为通过下降(皮质脊髓)预测参与的经典反射弧。这提供了一种形式主义,将平衡点解推广——到自由度问题——到运动轨迹。


在预测编码中,通过自由能(自由作用)时间积分的梯度上升来优化模型参数会降低到联想或 Hebbian可塑性,并与大脑中的synaptic可塑性相关。

Perceptual precision, attention and salience 知觉的精确性、注意力和显著性

主动推理与最优控制相关,通过用关于状态转换或流的先验信念替换值函数或外推成本函数。这利用了贝叶斯过滤和贝尔曼方程的解决方案之间的紧密联系。然而,主动推理是从状态空间的向量 < math > w (x) </math > 和向量 < math > w (x) </math > 值函数(c.f,亥姆霍兹分解)开始的。这里,< math > Gamma </math > 是随机波动的振幅,成本是 < math > c (x) = f cdot nabla v + nabla cdot cdot v </math > 。P (tilde { x } mid m) </math > > p (mid m) </math > > p (mid m) = exp (v (x)) </math > 这是适当的前向 Kolmogorov 方程的解。相比之下,给定一个成本函数,在假设 < math > w = 0 </math > (即,流是无卷曲的或有详细的平衡)的情况下,最优控制使流量最优化。通常,这需要求解向后的 Kolmogorov 方程。


优化精度参数对应于优化预测误差的增益(c.f.,Kalman增益)。在预测性编码的神经元似是而非的实现中,[32]这对应于优化浅表锥体细胞的兴奋性,并被解释为注意增益。[35]


在多目标环境下,通过对名为PE-SAIM的SAIM进行贝叶斯重构,模拟选择性注意任务的结果。图表显示了知识网络中FOA和两个模板单元激活的时间过程。


最优决策问题(通常表示为部分可观测的马尔可夫决策过程)在主动推理中通过吸收效用函数到先验信念来处理。在此设置中,具有高效用(低成本)的状态是代理期望占据的状态。通过给生成模型装备隐藏状态,模型控制,政策(控制序列) ,最小化变化的自由能,导致高效用状态。


关于自上而下与自下而上的争论,已经被作为一个主要的开放性问题的注意,一个计算模型已经成功地说明了自上而下和自下而上机制之间的往复循环性质。利用已建立的注意涌现模型SAIM,作者提出了一个称为PE-SAIM的模型,与标准模型相比,该模型从自上而下的立场接近选择性注意。该模型考虑了发送到同一级别或更高级别的转发预测误差,以最小化表示数据及其原因之间的差异的能量函数,换句话说,生成模型和后验模型之间的差异。为了提高有效性,他们还在模型中加入了刺激物之间的神经竞争。该模型的一个显著特点是仅根据任务执行过程中的预测误差来重新构造自由能函数。


神经生物学认为,多巴胺等神经第质通过调节主细胞编码预测错误的增益来报告预测错误的准确性。这与多巴胺在报告预测错误本身和相关计算机账户中的作用密切相关,但在形式上有所不同。


[math]\displaystyle{ \dfrac{\partial E^{total}(Y^{VP},X^{SN},x^{CN},y^{KN})}{\partial y^{SN}_{mn}}=x^{CN}_{mn}-b^{CN}\varepsilon^{CN}_{nm}+b^{CN}\sum_{k}(\varepsilon^{KN}_{knm}) }[/math]


其中,[math]\displaystyle{ E^{total} }[/math]是神经网络的总能量函数[math]\displaystyle{ \varepsilon^{KN}_{knm} }[/math]是生成模型(先验)和后验随时间变化的预测误差。[36])


主动推理已经被用来解决一系列的问题,包括认知神经科学,大脑功能和神经精神病学,包括: 行为观察,镜像神经元,扫视和视觉搜索,眼球运动,睡眠,幻觉,注意力,歇斯底里和精神病。对主动推理中行为的解释往往依赖于这样一种观点,即大脑具有无法更新的“顽固预测” ,导致这些预测成为现实的行为。


比较这两个模型的结果发现他们的结果之间有显著的相似性,同时指出了一个显著的差异,即在SAIM的标准版本中,模型的重点主要是兴奋性连接,而在PE-SAIM中,抑制性连接将被用来进行推断。该模型对人体实验的脑电和功能磁共振数据具有较高的预测精度。

Active inference 主动推理

当梯度下降应用于动作[math]\displaystyle{ \dot{a}=-\partial\u aF(s,\tilde{\mu}) }[/math]时,运动控制可以理解为通过下降(皮质脊髓)预测参与的经典反射弧。这提供了一种形式主义,将平衡点解推广到自由度问题[37]移动轨迹。

Active inference and optimal control 主动推理与最优控制

主动推理与最优控制有关,它用状态转移或流的先验信念替换价值或成本函数。[38]这充分利用了贝叶斯滤波和Bellman方程解之间的紧密联系。然而,主动推理从状态空间的标量[math]\displaystyle{ V(x) }[/math]和向量[math]\displaystyle{ W(x) }[/math]值函数(c.f.,Helmholtz分解)指定的流[math]\displaystyle{ f=\Gamma\cdot\nabla V+\nabla\times W }[/math]开始。这里,[math]\displaystyle{ \Gamma }[/math]是随机波动的幅度,成本是[math]\displaystyle{ c(x)=f\cdot\nabla V+\nabla\cdot\Gamma\cdot V }[/math]。流上的先验[math]\displaystyle{ p(\tilde{x}\mid m) }[/math]诱导了一个先验的超状态[math]\displaystyle{ p(x\mid m)=\exp(V(x)) }[/math]这是相应的正向Kolmogorov方程的解。[39]相反,在假设<math>W=0的情况下,最优控制优化了给定成本函数的流量(即,流量没有旋度或具有详细平衡)。通常,这需要向后求解Kolmogorov方程[40]

Active inference and optimal decision (game) theory 主动推理与最优决策(博弈)理论

最优决策问题(通常表示为部分可观测马尔可夫决策过程es)通过将效用函数吸收到先验信念中,在主动推理中处理。在此设置中,具有高效用(低成本)的状态是代理希望占用的状态。通过给生成模型配备模型控制的隐藏状态,最小化可变自由能的策略(控制序列)会导致高效用状态。 [41]


神经生物学上,神经调节剂多巴胺被认为通过调节编码预测误差的主细胞的增益来报告预测误差的准确性。[42] 这与多巴胺在报告预测错误“本身”中的作用密切相关,但在形式上与之不同[43]以及与计算账户相关[44]

Active inference and cognitive neuroscience 主动推理与认知神经科学

Active inference has been used to address a range of issues in cognitive neuroscience, brain function and neuropsychiatry, including: action observation,[45] mirror neurons,[46] saccades and visual search,[47][48] eye movements,[49] sleep,[50] illusions,[51] attention,[35] action selection,[42] consciousness,[52][53] hysteria[54] and psychosis.[55] Explanations of action in active inference often depend on the idea that the brain has 'stubborn predictions' which it cannot update, leading to actions that cause these predictions to come true.[56]

主动推理已被用于解决认知神经科学、脑功能和神经精神病学的一系列问题,包括:行动观察,[57]镜像神经元,[58] 扫视和视觉搜索,[59][60] 眼球运动,[61]  睡眠,[62]幻觉,[63] 注意,[35] 动作选择,[42] 意识,[64][65] hysteria[66] 还有精神病。[67] 对主动推理中行为的解释常常依赖于这样一种观点,即大脑有“顽固的预测”,它无法更新,从而导致使这些预测成真的行为。[68]

See also 请参阅

Category:Biological systems

类别: 生物系统

Category:Systems theory

范畴: 系统论

Category:Computational neuroscience

类别: 计算神经科学

Category:Mathematical and theoretical biology

类别: 数学和理论生物学


This page was moved from wikipedia:en:Free energy principle. Its edit history can be viewed at 自由能原理/edithistory

  1. Ashby, W. R. (1962). Principles of the self-organizing system.in Principles of Self-Organization: Transactions of the University of Illinois Symposium, H. Von Foerster and G. W. Zopf, Jr. (eds.), Pergamon Press: London, UK, pp. 255–278.
  2. Friston, Karl; Kilner, James; Harrison, Lee (2006). "A free energy principle for the brain" (PDF). Journal of Physiology-Paris. Elsevier BV. 100 (1–3): 70–87. doi:10.1016/j.jphysparis.2006.10.001. ISSN 0928-4257. PMID 17097864. S2CID 637885.
  3. 3.0 3.1 Shaun Raviv: The Genius Neuroscientist Who Might Hold the Key to True AI. In: Wired, 13. November 2018
  4. Freed, Peter (2010). "Research Digest". Neuropsychoanalysis. Informa UK Limited. 12 (1): 103–106. doi:10.1080/15294145.2010.10773634. ISSN 1529-4145. S2CID 220306712.
  5. Colombo, Matteo; Wright, Cory (2018-09-10). "First principles in the life sciences: the free-energy principle, organicism, and mechanism". Synthese. Springer Science and Business Media LLC. doi:10.1007/s11229-018-01932-w. ISSN 0039-7857.
  6. Friston, Karl (2018). "Of woodlice and men: A Bayesian account of cognition, life and consciousness. An interview with Karl Friston (by Martin Fortier & Daniel Friedman)". ALIUS Bulletin. 2: 17–43.
  7. 7.0 7.1 Helmholtz, H. (1866/1962). Concerning the perceptions in general. In Treatise on physiological optics (J. Southall, Trans., 3rd ed., Vol. III). New York: Dover.
  8. Gregory, R. L. (1980-07-08). "Perceptions as hypotheses". Philosophical Transactions of the Royal Society of London. B, Biological Sciences. The Royal Society. 290 (1038): 181–197. Bibcode:1980RSPTB.290..181G. doi:10.1098/rstb.1980.0090. ISSN 0080-4622. JSTOR 2395424. PMID 6106237.
  9. 9.0 9.1 9.2 9.3 Dayan, Peter; Hinton, Geoffrey E.; Neal, Radford M.; Zemel, Richard S. (1995). "The Helmholtz Machine" (PDF). Neural Computation. MIT Press - Journals. 7 (5): 889–904. doi:10.1162/neco.1995.7.5.889. ISSN 0899-7667. PMID 7584891. S2CID 1890561.
  10. Beal, M. J. (2003). Variational Algorithms for Approximate Bayesian Inference. Ph.D. Thesis, University College London.
  11. 11.0 11.1 Karl, Friston (2012-10-31). "A Free Energy Principle for Biological Systems" (PDF). Entropy. MDPI AG. 14 (11): 2100–2121. Bibcode:2012Entrp..14.2100K. doi:10.3390/e14112100. ISSN 1099-4300. PMC 3510653. PMID 23204829.
  12. Colombo, Matteo; Wright, Cory (2018-09-10). "First principles in the life sciences: the free-energy principle, organicism, and mechanism". Synthese. Springer Science and Business Media LLC. doi:10.1007/s11229-018-01932-w. ISSN 0039-7857.
  13. Conant, R. C., & Ashby, R. W. (1970). Every Good Regulator of a system must be a model of that system. Int. J. Systems Sci. , 1 (2), 89–97.
  14. Kauffman, S. (1993). The Origins of Order: Self-Organization and Selection in Evolution. Oxford: Oxford University Press.
  15. Nicolis, G., & Prigogine, I. (1977). Self-organization in non-equilibrium systems. New York: John Wiley.
  16. Maturana, H. R., & Varela, F. (1980). Autopoiesis: the organization of the living. In V. F. Maturana HR (Ed.), Autopoiesis and Cognition. Dordrecht, Netherlands: Reidel.
  17. Nikolić, D. (2015). Practopoiesis: Or how life fosters a mind. Journal of theoretical biology, 373, 40-61.
  18. Haken, H. (1983). Synergetics: An introduction. Non-equilibrium phase transition and self-organisation in physics, chemistry and biology (3rd ed.). Berlin: Springer Verlag.
  19. Jaynes, E. T. (1957). Information Theory and Statistical Mechanics. Physical Review Series II, 106 (4), 620–30.
  20. Crauel, H., & Flandoli, F. (1994). Attractors for random dynamical systems. Probab Theory Relat Fields, 100, 365–393.
  21. Roweis, S., & Ghahramani, Z. (1999). A unifying review of linear Gaussian models. Neural Computat. , 11 (2), 305–45. doi:10.1162/089976699300016674
  22. Ortega, P. A., & Braun, D. A. (2012). Thermodynamics as a theory of decision-making with information processing costs. Proceedings of the Royal Society A, vol. 469, no. 2153 (20120683) .
  23. Evans, D. J. (2003). A non-equilibrium free energy theorem for deterministic systems. Molecular Physics , 101, 15551–4.
  24. Jarzynski, C. (1997). Nonequilibrium equality for free energy differences. Phys. Rev. Lett., 78, 2690.
  25. Barlow, H. (1961). Possible principles underlying the transformations of sensory messages -{zh-cn:互联网档案馆; zh-tw:網際網路檔案館; zh-hk:互聯網檔案館;}-存檔,存档日期2012-06-03.. In W. Rosenblith (Ed.), Sensory Communication (pp. 217-34). Cambridge, MA: MIT Press.
  26. Linsker, R. (1990).Perceptual neural organization: some approaches based on network models and information theory. Annu Rev Neurosci. , 13, 257–81.
  27. Bialek, W., Nemenman, I., & Tishby, N. (2001). Predictability, complexity, and learning. Neural Computat., 13 (11), 2409–63.
  28. Friston, K. (2010). The free-energy principle: a unified brain theory? Nat Rev Neurosci. , 11 (2), 127–38.
  29. Knill, D. C., & Pouget, A. (2004). The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. , 27 (12), 712–9.
  30. Friston, K., Stephan, K., Li, B., & Daunizeau, J. (2010). Generalised Filtering. Mathematical Problems in Engineering, vol., 2010, 621670
  31. Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci. , 2 (1), 79–87.
  32. 32.0 32.1 Mumford, D. (1992). On the computational architecture of the neocortex. II. Biol. Cybern. , 66, 241–51.
  33. Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston, K. J. (2012). Canonical microcircuits for predictive coding. Neuron , 76 (4), 695–711.
  34. Adams, R. A., Shipp, S., & Friston, K. J. (2013). Predictions not commands: active inference in the motor system. Brain Struct Funct. , 218 (3), 611–43
  35. 35.0 35.1 35.2 Feldman, H., & Friston, K. J. (2010). Attention, uncertainty, and free-energy. Frontiers in Human Neuroscience, 4, 215.
  36. Abadi K.A., Yahya K., Amini M., Heinke D. & Friston, K. J. (2019). Excitatory versus inhibitory feedback in Bayesian formulations of scene construction. 16 R. Soc. Interface
  37. Feldman, A. G., & Levin, M. F. (1995). The origin and use of positional frames of reference in motor control. Behav Brain Sci. , 18, 723–806.
  38. Friston, K., (2011). What is optimal about motor control?. Neuron, 72(3), 488–98.
  39. Friston, K., & Ao, P. (2012). Free-energy, value and attractors. Computational and mathematical methods in medicine, 2012, 937860.
  40. Kappen, H., (2005). Path integrals and symmetry breaking for optimal control theory. Journal of Statistical Mechanics: Theory and Experiment, 11, p. P11011.
  41. Friston, K., Samothrakis, S. & Montague, R., (2012). Active inference and agency: optimal control without cost functions. Biol. Cybernetics, 106(8–9), 523–41.
  42. 42.0 42.1 42.2 Friston, K. J. Shiner T, FitzGerald T, Galea JM, Adams R, Brown H, Dolan RJ, Moran R, Stephan KE, Bestmann S. (2012). Dopamine, affordance and active inference. PLoS Comput. Biol., 8(1), p. e1002327.
  43. Fiorillo, C. D., Tobler, P. N. & Schultz, W., (2003). Discrete coding of reward probability and uncertainty by dopamine neurons. Science, 299(5614), 1898–902.
  44. Frank, M. J., (2005). Dynamic dopamine modulation in the basal ganglia: a neurocomputational account of cognitive deficits in medicated and nonmedicated Parkinsonism. J Cogn Neurosci., Jan, 1, 51–72.
  45. Friston, K., Mattout, J. & Kilner, J., (2011). Action understanding and active inference. Biol Cybern., 104, 137–160.
  46. Kilner, J. M., Friston, K. J. & Frith, C. D., (2007). Predictive coding: an account of the mirror neuron system. Cogn Process., 8(3), pp. 159–66.
  47. Friston, K., Adams, R. A., Perrinet, L. & Breakspear, M., (2012). Perceptions as hypotheses: saccades as experiments. Front Psychol., 3, 151.
  48. Mirza, M., Adams, R., Mathys, C., Friston, K. (2018). Human visual exploration reduces uncertainty about the sensed world. PLoS One, 13(1): e0190429
  49. Perrinet L, Adams R, Friston, K. Active inference, eye movements and oculomotor delays. Biological Cybernetics, 108(6):777-801, 2014.
  50. Hobson, J. A. & Friston, K. J., (2012). Waking and dreaming consciousness: Neurobiological and functional considerations. Prog Neurobiol, 98(1), pp. 82–98.
  51. Brown, H., & Friston, K. J. (2012). Free-energy and illusions: the cornsweet effect. Front Psychol , 3, 43.
  52. Rudrauf, David; Bennequin, Daniel; Granic, Isabela; Landini, Gregory; Friston, Karl; Williford, Kenneth (2017-09-07). "A mathematical model of embodied consciousness" (PDF). Journal of Theoretical Biology. 428: 106–131. doi:10.1016/j.jtbi.2017.05.032. ISSN 0022-5193. PMID 28554611.
  53. K, Williford; D, Bennequin; K, Friston; D, Rudrauf (2018-12-17). "The Projective Consciousness Model and Phenomenal Selfhood". Frontiers in Psychology (in English). 9: 2571. doi:10.3389/fpsyg.2018.02571. PMC 6304424. PMID 30618988.
  54. Edwards, M. J., Adams, R. A., Brown, H., Pareés, I., & Friston, K. J. (2012). A Bayesian account of 'hysteria'. Brain , 135(Pt 11):3495–512.
  55. Adams RA, Perrinet LU, Friston K. (2012). Smooth pursuit and visual occlusion: active inference and oculomotor control in schizophrenia. PLoS One. , 12;7(10):e47502
  56. Yon, Daniel; Lange, Floris P. de; Press, Clare (2019-01-01). "The Predictive Brain as a Stubborn Scientist". Trends in Cognitive Sciences (in English). 23 (1): 6–8. doi:10.1016/j.tics.2018.10.003. ISSN 1364-6613. PMID 30429054. S2CID 53280000.
  57. Friston, K., Mattout, J. & Kilner, J., (2011). Action understanding and active inference. Biol Cybern., 104, 137–160.
  58. Kilner, J. M., Friston, K. J. & Frith, C. D., (2007). Predictive coding: an account of the mirror neuron system. Cogn Process., 8(3), pp. 159–66.
  59. Friston, K., Adams, R. A., Perrinet, L. & Breakspear, M., (2012). Perceptions as hypotheses: saccades as experiments. Front Psychol., 3, 151.
  60. Mirza, M., Adams, R., Mathys, C., Friston, K. (2018). Human visual exploration reduces uncertainty about the sensed world. PLoS One, 13(1): e0190429
  61. Perrinet L, Adams R, Friston, K. Active inference, eye movements and oculomotor delays. Biological Cybernetics, 108(6):777-801, 2014.
  62. Hobson, J. A. & Friston, K. J., (2012). Waking and dreaming consciousness: Neurobiological and functional considerations. Prog Neurobiol, 98(1), pp. 82–98.
  63. Brown, H., & Friston, K. J. (2012). Free-energy and illusions: the cornsweet effect. Front Psychol , 3, 43.
  64. Rudrauf, David; Bennequin, Daniel; Granic, Isabela; Landini, Gregory; Friston, Karl; Williford, Kenneth (2017-09-07). "A mathematical model of embodied consciousness" (PDF). Journal of Theoretical Biology. 428: 106–131. doi:10.1016/j.jtbi.2017.05.032. ISSN 0022-5193. PMID 28554611.
  65. K, Williford; D, Bennequin; K, Friston; D, Rudrauf (2018-12-17). "The Projective Consciousness Model and Phenomenal Selfhood". Frontiers in Psychology (in English). 9: 2571. doi:10.3389/fpsyg.2018.02571. PMC 6304424. PMID 30618988.
  66. Edwards, M. J., Adams, R. A., Brown, H., Pareés, I., & Friston, K. J. (2012). A Bayesian account of 'hysteria'. Brain , 135(Pt 11):3495–512.
  67. Adams RA, Perrinet LU, Friston K. (2012). Smooth pursuit and visual occlusion: active inference and oculomotor control in schizophrenia. PLoS One. , 12;7(10):e47502
  68. Yon, Daniel; Lange, Floris P. de; Press, Clare (2019-01-01). "The Predictive Brain as a Stubborn Scientist". Trends in Cognitive Sciences (in English). 23 (1): 6–8. doi:10.1016/j.tics.2018.10.003. ISSN 1364-6613. PMID 30429054. S2CID 53280000.