更改

条件互信息 (查看源代码)

2021年8月22日 (日) 21:15的版本

删除659字节、 2021年8月22日 (日) 21:15

小

无编辑摘要

第4行：第4行： −

在'''~~~~ 概率论 Probability theory~~~~'''中，特别是'''~~~~ 信息论 Information theory~~~~'''中，'''~~~~ 条件互信息 Conditional mutual information ~~~~'''<ref name = Wyner1978>{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|url=|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}</ref><ref name = Dobrushin1959>{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon's main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}}</ref>的基本形式表示为当给定第三个变量的情况下两个随机变量间互信息的期望值。

+

在'''概率论 Probability theory'''中，特别是''' 信息论 Information theory'''中，''' 条件互信息 Conditional mutual information '''<ref name="Wyner1978">{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|url=|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}</ref><ref name="Dobrushin1959">{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon's main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}}</ref>的基本形式表示为当给定第三个变量的情况下两个随机变量间互信息的期望值。

+

==定义==

−

~~== 定义 ==~~

−

+

对于具有''' 支持集 Probability theory''' <math>\mathcal{X}</math>, <math>\mathcal{Y}</math> 和 <math>\mathcal{Z}</math>的随机变量<math>X</math>, <math>Y</math>和 <math>Z</math>，我们将条件互信息定义为：

−

对于具有'''~~~~ 支持集 Probability theory~~~~''' <math>\mathcal{X}</math>, <math>\mathcal{Y}</math> 和 <math>\mathcal{Z}</math>的随机变量<math>X</math>, <math>Y</math>和 <math>Z</math>，我们将条件互信息定义为：

第39行：第38行： −

因此，相较于互信息的定义，<math>I(X;Y|Z)</math>可以表达为期望的'''~~~~ KL散度 Kullback–Leibler divergence ~~~~'''（相对于<math>Z</math>），即从条件联合分布<math>P_{(X,Y)|Z}</math>到条件边际<math>P_{X|Z}</math> 和 <math>P_{Y|Z}</math>的乘积。

+

因此，相较于互信息的定义，<math>I(X;Y|Z)</math>可以表达为期望的''' KL散度 Kullback–Leibler divergence '''（相对于<math>Z</math>），即从条件联合分布<math>P_{(X,Y)|Z}</math>到条件边际<math>P_{X|Z}</math> 和 <math>P_{Y|Z}</math>的乘积。

−

+

==关于离散分布的概率质量函数==

−

== 关于离散分布的概率质量函数 ==

第58行：第56行： −

其中边缘概率质量函数，联合概率质量函数，和（或）条件'''~~~~概率质量函数 Probability mass function ~~~~'''可以由<math>p</math>加上适当的下标表示。这可以简化为:

+

其中边缘概率质量函数，联合概率质量函数，和（或）条件'''概率质量函数 Probability mass function '''可以由<math>p</math>加上适当的下标表示。这可以简化为:

第74行：第72行： −

+

==关于连续分布的概率密度函数==

−

== 关于连续分布的概率密度函数 ==

第90行：第87行： −

其中边缘概率密度函数，联合概率密度函数，和（或）条件'''~~~~概率密度函数 Probability density function ~~~~'''可以由p加上适当的下标表示。这可以简化为

+

其中边缘概率密度函数，联合概率密度函数，和（或）条件'''概率密度函数 Probability density function '''可以由p加上适当的下标表示。这可以简化为

第106行：第103行： −

+

==Some identities 部分特性==

−

== Some identities 部分特性 ==

第143行：第139行：

:<math>I(X;Y|Z) = H(Z|X) + H(X) + H(Z|Y) + H(Y) - H(Z|X,Y) - H(X,Y) - H(Z)

= I(X;Y) + H(Z|X) + H(Z|Y) - H(Z|X,Y) - H(Z)</math>

+

第164行：第161行： −

+

==More general definition 其他通用定义==

−

== More general definition 其他通用定义==

−

条件互信息的其他通用定义（适用于具有连续或其他任意分布的随机变量）将取决于'''~~~~ 正则条件概率 Regular conditional probability ~~~~'''的概念。(参阅<ref>[http://planetmath.org/encyclopedia/ConditionalProbabilityMeasure.html Regular Conditional Probability] on [http://planetmath.org/ PlanetMath]</ref><ref>D. Leao, Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref>))

+

条件互信息的其他通用定义（适用于具有连续或其他任意分布的随机变量）将取决于''' 正则条件概率 Regular conditional probability '''的概念。(参阅<ref>[http://planetmath.org/encyclopedia/ConditionalProbabilityMeasure.html Regular Conditional Probability] on [http://planetmath.org/ PlanetMath]</ref><ref>D. Leao, Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref>))

−

令<math>(\Omega, \mathcal F, \mathfrak P)</math>为一个'''~~~~ 概率空间 Probability space ~~~~'''，并将随机变量<math>X</math>, <math>Y</math>和 <math>Z</math>分别定义为一个从<math>\Omega</math>到具有拓扑结构的状态空间的'''~~~~ 波莱尔可测函数 Borel-measurable function ~~~~'''。

+

令<math>(\Omega, \mathcal F, \mathfrak P)</math>为一个''' 概率空间 Probability space '''，并将随机变量<math>X</math>, <math>Y</math>和 <math>Z</math>分别定义为一个从<math>\Omega</math>到具有拓扑结构的状态空间的''' 波莱尔可测函数 Borel-measurable function '''。

−

考虑到在每个随机变量状态空间中的'''~~~~ 波莱尔测度 Borel measure~~~~'''（关于开放集生成的σ代数），是由<math>\mathcal F</math>中每个波莱尔集分配到的的原像<math>\mathfrak P</math>测度来确定的。这被称为'''~~~~ 前推测度 Pushforward measure ~~~~''' <math>X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big).</math>。随机变量的支撑集定义为该测度的拓扑支撑集，即<math>\mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P.</math>。

+

考虑到在每个随机变量状态空间中的''' 波莱尔测度 Borel measure'''（关于开放集生成的σ代数），是由<math>\mathcal F</math>中每个波莱尔集分配到的的原像<math>\mathfrak P</math>测度来确定的。这被称为''' 前推测度 Pushforward measure ''' <math>X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big).</math>。随机变量的支撑集定义为该测度的拓扑支撑集，即<math>\mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P.</math>。

−

现在，我们可以在给定其中一个随机变量值（或通过'''~~~~ 积拓扑 product topology ~~~~'''获得更多）的情况下正式定义'''~~~~ 条件概率测度 Conditional probability distribution|conditional probability measure ~~~~'''。令<math>M</math>为<math>\Omega</math>的可测子集（即<math>M \in \mathcal F,</math>），令<math>x \in \mathrm{supp}\,X</math>。然后，使用'''~~~~ 分解定理 Disintegration theorem ~~~~'''：

+

现在，我们可以在给定其中一个随机变量值（或通过''' 积拓扑 product topology '''获得更多）的情况下正式定义'''条件概率测度 Conditional probability distribution|conditional probability measure '''。令<math>M</math>为<math>\Omega</math>的可测子集（即<math>M \in \mathcal F,</math>），令<math>x \in \mathrm{supp}\,X</math>。然后，使用''' 分解定理 Disintegration theorem '''：

第195行：第191行： −

在<math>x</math>的开放邻域<math>U</math>处取极限，因为相对于'''~~~~ 集包含 Set inclusion~~~~'''，它们可以任意变小。

+

在<math>x</math>的开放邻域<math>U</math>处取极限，因为相对于'''集包含 Set inclusion'''，它们可以任意变小。

−

最后，我们可以通过'''~~~~ 勒贝格积分 Lebesgue integration~~~~'''来定义条件互信息：

+

最后，我们可以通过''' 勒贝格积分 Lebesgue integration'''来定义条件互信息：

第215行：第211行： −

其中被积函数是'''~~~~ 拉东-尼科迪姆导数 Radon–Nikodym derivative~~~~'''的对数，涉及我们刚刚定义的一些条件概率测度。

+

其中被积函数是''' 拉东-尼科迪姆导数 Radon–Nikodym derivative'''的对数，涉及我们刚刚定义的一些条件概率测度。

−

== 注释符号 ==

+

==注释符号==

−

在诸如<math>I(A;B|C)</math>的表达式中，<math>A</math> <math>B</math> 和 <math>C</math>不限于表示单个随机变量，它们同时可以表示在同一概率空间上定义的任意随机变量集合的联合分布。类似概率论中的表达方式，我们可以使用逗号来表示这种联合分布，例如<math>I(A_0,A_1;B_1,B_2,B_3|C_0,C_1)</math>。因此，使用分号（或有时用冒号或楔形<math>\wedge</math>）来分隔互信息符号的主要参数。（在联合熵的符号中，不需要作这样的区分，因为任意数量随机变量的'''~~~~ 联合熵 Joint entropy~~~~'''与它们联合分布的熵相同。）

+

在诸如<math>I(A;B|C)</math>的表达式中，<math>A</math> <math>B</math> 和 <math>C</math>不限于表示单个随机变量，它们同时可以表示在同一概率空间上定义的任意随机变量集合的联合分布。类似概率论中的表达方式，我们可以使用逗号来表示这种联合分布，例如<math>I(A_0,A_1;B_1,B_2,B_3|C_0,C_1)</math>。因此，使用分号（或有时用冒号或楔形<math>\wedge</math>）来分隔互信息符号的主要参数。（在联合熵的符号中，不需要作这样的区分，因为任意数量随机变量的''' 联合熵 Joint entropy'''与它们联合分布的熵相同。）

−

+

==属性==

−

== 属性==

===非负性===

第240行：第234行： −

=== 交互信息 ===

+

===交互信息===

−

考虑到第三个随机变量条件可能会增加或减少'''~~~~ 互信息 ~~Mutual information ~~'''：例如其差值<math>I(X;Y) - I(X;Y|Z)</math>，称为'''~~~~ 交互信息 Interaction information ~~~~'''(注意区分互信息Mutual information)，可以为正，负或零。即使随机变量是成对独立的也是如此。比如以下情况下：

+

考虑到第三个随机变量条件可能会增加或减少''' 互信息'''：例如其差值<math>I(X;Y) - I(X;Y|Z)</math>，称为''' 交互信息 Interaction information '''(注意区分互信息Mutual information)，可以为正，负或零。即使随机变量是成对独立的也是如此。比如以下情况下：

第252行：第247行： −

+

===互信息的链式法则===

−

=== 互信息的链式法则 ===

:<math>I(X;Y,Z) = I(X;Z) + I(X;Y|Z)</math>

−

+

==Multivariate mutual information 多元互信息==

−

== Multivariate mutual information 多元互信息 ==

第281行：第274行： −

+

==References 参考文献==

−

== References 参考文献 ==

+

−

[[Category:Information theory]]

[[Category:Entropy and information]]

徐勇勋

2

个编辑

更改

条件互信息 (查看源代码)

2021年8月22日 (日) 21:15的版本

导航菜单

搜索