更改

条件互信息 (查看源代码)

2021年8月22日 (日) 20:59的版本

删除8,651字节、 2021年8月22日 (日) 20:59

小

无编辑摘要

第1行：第1行：

此词条Jie翻译,由Flipped完成审校.

−

~~{{Information theory}}~~

+

−

−

~~In [[probability~~ theory~~]], particularly [[information~~ theory~~]], the~~ '''~~conditional~~ mutual information'''<ref name = Wyner1978>{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|url=|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}</ref><ref name = Dobrushin1959>{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon's main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}}</ref> ~~is, in its most basic form, the [[expected value]] of the [[mutual information]] of two random variables given the value of a third.~~

+

在''' 概率论 Probability theory'''中，特别是''' 信息论 Information theory'''中，''' 条件互信息 Conditional mutual information '''<ref name = Wyner1978>{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|url=|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}</ref><ref name = Dobrushin1959>{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon's main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}}</ref>的基本形式表示为当给定第三个变量的情况下两个随机变量间互信息的期望值。

−

在''' 概率论 Probability theory'''中，特别是''' 信息论 Information theory'''中，''' 条件互信息 Conditional mutual information '''<ref name = Wyner1978>{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|url=|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}</ref><ref name = Dobrushin1959>{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon's main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}}</ref>的基本形式表示为当给定第三个变量的情况下两个随机变量间互信息的期望值。

+

== 定义 ==

−

~~== Definition 定义 ==~~

−

For random variables <math>X</math>, <math>Y</math>, and <math>Z</math> with [[Support (mathematics)|support sets]] <math>\mathcal{X}</math>, <math>\mathcal{Y}</math> and <math>\mathcal{Z}</math>, we define the conditional mutual information as

对于具有''' 支持集 Probability theory''' <math>\mathcal{X}</math>, <math>\mathcal{Y}</math> 和 <math>\mathcal{Z}</math>的随机变量<math>X</math>, <math>Y</math>和 <math>Z</math>，我们将条件互信息定义为：

第28行：第25行：

|border colour = #0073CF

|background colour=#F5FFFA}}

+

−

~~This may be written in terms of the expectation operator:~~

这可以用期望运算符来表示：

第38行：第36行： −

Thus <math>I(X;Y|Z)</math> is the expected (with respect to <math>Z</math>) [[Kullback–Leibler divergence]] from the conditional joint distribution <math>P_{(X,Y)|Z}</math> to the product of the conditional marginals <math>P_{X|Z}</math> and <math>P_{Y|Z}</math>. Compare with the definition of [[mutual information]].

+

因此，相较于互信息的定义，<math>I(X;Y|Z)</math>可以表达为期望的''' KL散度 Kullback–Leibler divergence '''（相对于<math>Z</math>），即从条件联合分布<math>P_{(X,Y)|Z}</math>到条件边际<math>P_{X|Z}</math> 和 <math>P_{Y|Z}</math>的乘积。

第44行：第43行： −

== ~~In terms of pmf's for discrete distributions~~ 关于离散分布的概率质量函数 ==

+

== 关于离散分布的概率质量函数 ==

−

For discrete random variables <math>X</math>, <math>Y</math>, and <math>Z</math> with [[Support (mathematics)|support sets]] <math>\mathcal{X}</math>, <math>\mathcal{Y}</math> and <math>\mathcal{Z}</math>, the conditional mutual information <math>I(X;Y|Z)</math> is as follows

对于具有支持集<math>X</math>, <math>Y</math>, 和 <math>Z</math>的离散随机变量<math>\mathcal{X}</math>, <math>\mathcal{Y}</math> 和 <math>\mathcal{Z}</math>，条件互信息<math>I(X;Y|Z)</math>如下:

第57行：第55行： −

~~where the marginal, joint, and/or conditional [[probability mass function]]s are denoted by <math>p</math> with the appropriate subscript. This can be simplified as~~

+

其中边缘概率质量函数，联合概率质量函数，和（或）条件'''概率质量函数 Probability mass function '''可以由<math>p</math>加上适当的下标表示。这可以简化为:

第76行：第75行： −

== ~~In terms of pdf's for continuous distributions~~ 关于连续分布的概率密度函数 ==

+

== 关于连续分布的概率密度函数 ==

−

For (absolutely) continuous random variables <math>X</math>, <math>Y</math>, and <math>Z</math> with [[Support (mathematics)|support sets]] <math>\mathcal{X}</math>, <math>\mathcal{Y}</math> and <math>\mathcal{Z}</math>, the conditional mutual information <math>I(X;Y|Z)</math> is as follows

对于具有支持集<math>X</math>, <math>Y</math>, 和 <math>Z</math>的（绝对）连续随机变量<math>\mathcal{X}</math>, <math>\mathcal{Y}</math> 和 <math>\mathcal{Z}</math>，条件互信息<math>I(X;Y|Z)</math>如下:

第89行：第87行： −

~~where the marginal, joint, and/or conditional [[probability density function]]s are denoted by <math>p</math> with the appropriate subscript. This can be simplified as~~

+

其中边缘概率密度函数，联合概率密度函数，和（或）条件'''概率密度函数 Probability density function '''可以由p加上适当的下标表示。这可以简化为

第109行：第108行：

== Some identities 部分特性 ==

−

Alternatively, we may write in terms of joint and conditional [[Entropy (information theory)|entropies]] as<ref>{{cite book |last1=Cover |first1=Thomas |author-link1=Thomas M. Cover |last2=Thomas |first2=Joy A. |title=Elements of Information Theory |edition=2nd |location=New York |publisher=[[Wiley-Interscience]] |date=2006 |isbn=0-471-24195-4}}</ref>

+

同时我们也可以将联合和条件熵写为<ref>{{cite book |last1=Cover |first1=Thomas |author-link1=Thomas M. Cover |last2=Thomas |first2=Joy A. |title=Elements of Information Theory |edition=2nd |location=New York |publisher=[[Wiley-Interscience]] |date=2006 |isbn=0-471-24195-4}}</ref>：

第118行：第117行： −

~~This can be rewritten to show its relationship to mutual information~~

+

这么表达以显示其与互信息的关系

第126行：第126行： −

~~usually rearranged as '''the chain rule for mutual information'''~~

+

通常情况下，表达式被重新整理为“互信息的链式法则”

第134行：第135行： −

~~Another equivalent form of the above is<ref>[https://math.stackexchange.com/q/1863993 Decomposition on Math.StackExchange]</ref>~~

+

上述式子的另一种等价形式是<ref>[https://math.stackexchange.com/q/1863993 Decomposition on Math.StackExchange]</ref>：

第143行：第145行： −

~~Like mutual information, conditional mutual information can be expressed as a [[Kullback–Leibler divergence]]:~~

+

类似互信息一样，条件互信息可以表示为KL散度：

第151行：第154行： −

~~Or as an expected value of simpler Kullback–Leibler divergences:~~

+

或作为更简单的KL散度的期望值：

第162行：第166行：

== More general definition 其他通用定义==

−

A more general definition of conditional mutual information, applicable to random variables with continuous or other arbitrary distributions, will depend on the concept of '''[[regular conditional probability]]'''. (See also. <ref>[http://planetmath.org/encyclopedia/ConditionalProbabilityMeasure.html Regular Conditional Probability] on [http://planetmath.org/ PlanetMath]</ref><ref>D. Leao, Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref>)

+

条件互信息的其他通用定义（适用于具有连续或其他任意分布的随机变量）将取决于''' 正则条件概率 Regular conditional probability '''的概念。(参阅<ref>[http://planetmath.org/encyclopedia/ConditionalProbabilityMeasure.html Regular Conditional Probability] on [http://planetmath.org/ PlanetMath]</ref><ref>D. Leao, Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref>))

第168行：第172行： −

Let <math>(\Omega, \mathcal F, \mathfrak P)</math> be a [[probability space]], and let the random variables <math>X</math>, <math>Y</math>, and <math>Z</math> each be defined as a Borel-measurable function from <math>\Omega</math> to some state space endowed with a topological structure.

令<math>(\Omega, \mathcal F, \mathfrak P)</math>为一个''' 概率空间 Probability space '''，并将随机变量<math>X</math>, <math>Y</math>和 <math>Z</math>分别定义为一个从<math>\Omega</math>到具有拓扑结构的状态空间的''' 波莱尔可测函数 Borel-measurable function '''。

第174行：第177行： −

Consider the Borel measure (on the σ-algebra generated by the open sets) in the state space of each random variable defined by assigning each Borel set the <math>\mathfrak P</math>-measure of its preimage in <math>\mathcal F</math>. This is called the [[pushforward measure]] <math>X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big).</math> The '''support of a random variable''' is defined to be the [[Support (measure theory)|topological support]] of this measure, i.e. <math>\mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P.</math>

考虑到在每个随机变量状态空间中的''' 波莱尔测度 Borel measure'''（关于开放集生成的σ代数），是由<math>\mathcal F</math>中每个波莱尔集分配到的的原像<math>\mathfrak P</math>测度来确定的。这被称为''' 前推测度 Pushforward measure ''' <math>X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big).</math>。随机变量的支撑集定义为该测度的拓扑支撑集，即<math>\mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P.</math>。

第180行：第182行： −

Now we can formally define the [[conditional probability distribution|conditional probability measure]] given the value of one (or, via the [[product topology]], more) of the random variables. Let <math>M</math> be a measurable subset of <math>\Omega,</math> (i.e. <math>M \in \mathcal F,</math>) and let <math>x \in \mathrm{supp}\,X.</math> Then, using the [[disintegration theorem]]:

现在，我们可以在给定其中一个随机变量值（或通过''' 积拓扑 product topology '''获得更多）的情况下正式定义''' 条件概率测度 Conditional probability distribution|conditional probability measure '''。令<math>M</math>为<math>\Omega</math>的可测子集（即<math>M \in \mathcal F,</math>），令<math>x \in \mathrm{supp}\,X</math>。然后，使用''' 分解定理 Disintegration theorem '''：

第191行：第192行： −

~~where the limit is taken over the open neighborhoods <math>U</math> of <math>x</math>, as they are allowed to become arbitrarily smaller with respect to [[Subset|set inclusion]].~~

+

在<math>x</math>的开放邻域<math>U</math>处取极限，因为相对于''' 集包含 Set inclusion'''，它们可以任意变小。

第197行：第199行： −

~~Finally we can define the conditional mutual information via [[Lebesgue integration]]:~~

最后，我们可以通过''' 勒贝格积分 Lebesgue integration'''来定义条件互信息：

第211行：第212行： −

~~where the integrand is the logarithm of a [[Radon–Nikodym derivative]] involving some of the conditional probability measures we have just defined.~~

+

其中被积函数是''' 拉东-尼科迪姆导数 Radon–Nikodym derivative'''的对数，涉及我们刚刚定义的一些条件概率测度。

第217行：第219行： −

== ~~Note on notation~~ 注释符号 ==

+

== 注释符号 ==

−

In an expression such as <math>I(A;B|C),</math> <math>A,</math> <math>B,</math> and <math>C</math> need not necessarily be restricted to representing individual random variables, but could also represent the joint distribution of any collection of random variables defined on the same [[probability space]]. As is common in [[probability theory]], we may use the comma to denote such a joint distribution, e.g. <math>I(A_0,A_1;B_1,B_2,B_3|C_0,C_1).</math> Hence the use of the semicolon (or occasionally a colon or even a wedge <math>\wedge</math>) to separate the principal arguments of the mutual information symbol. (No such distinction is necessary in the symbol for [[joint entropy]], since the joint entropy of any number of random variables is the same as the entropy of their joint distribution.)

+

在诸如<math>I(A;B|C)</math>的表达式中，<math>A</math> <math>B</math> 和 <math>C</math>不限于表示单个随机变量，它们同时可以表示在同一概率空间上定义的任意随机变量集合的联合分布。类似概率论中的表达方式，我们可以使用逗号来表示这种联合分布，例如<math>I(A_0,A_1;B_1,B_2,B_3|C_0,C_1)</math>。因此，使用分号（或有时用冒号或楔形<math>\wedge</math>）来分隔互信息符号的主要参数。（在联合熵的符号中，不需要作这样的区分，因为任意数量随机变量的''' 联合熵 Joint entropy'''与它们联合分布的熵相同。）

第224行：第226行： −

== ~~Properties~~ 属性==

+

== 属性==

−

===~~Nonnegativity~~ 非负性===

+

===非负性===

−

~~It is always true that~~

+

−

~~:<math>I(X;Y|Z) \ge 0</math>,~~

−

for discrete, jointly distributed random variables <math>X</math>, <math>Y</math> and <math>Z</math>. This result has been used as a basic building block for proving other [[inequalities in information theory]], in particular, those known as Shannon-type inequalities. Conditional mutual information is also non-negative for continuous random variables under certain regularity conditions.<ref>{{cite book |last1=Polyanskiy |first1=Yury |last2=Wu |first2=Yihong |title=Lecture notes on information theory |date=2017 |page=30 |url=http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf}}</ref>

对于离散，联合分布的随机变量X，Y和Z，如下不等式永远成立：

第240行：第240行： −

=== ~~Interaction information~~ 交互信息 ===

+

=== 交互信息 ===

−

Conditioning on a third random variable may either increase or decrease the mutual information: that is, the difference <math>I(X;Y) - I(X;Y|Z)</math>, called the [[interaction information]], may be positive, negative, or zero. This is the case even when random variables are pairwise independent. Such is the case when:

−

~~<math display="block">X \sim \mathrm{Bernoulli}(0.5), Z \sim \mathrm{Bernoulli}(0.5), \quad Y=\left\{\begin{array}{ll} X & \text{if }Z=0\\ 1-X & \text{if }Z=1 \end{array}\right.</math>~~

−

~~in which case <math>X</math>, <math>Y</math> and <math>Z</math> are pairwise independent and in particular <math>I(X;Y)=0</math>, but <math>I(X;Y|Z)=1.</math>~~

考虑到第三个随机变量条件可能会增加或减少''' 互信息 Mutual information '''：例如其差值<math>I(X;Y) - I(X;Y|Z)</math>，称为''' 交互信息 Interaction information '''(注意区分互信息Mutual information)，可以为正，负或零。即使随机变量是成对独立的也是如此。比如以下情况下：

第258行：第253行： −

=== ~~Chain rule for mutual information~~ 互信息的链式法则 ===

+

=== 互信息的链式法则 ===

:<math>I(X;Y,Z) = I(X;Z) + I(X;Y|Z)</math>

第265行：第260行：

== Multivariate mutual information 多元互信息 ==

−

The conditional mutual information can be used to inductively define a '''multivariate mutual information''' in a set- or [[Information theory and measure theory|measure-theoretic sense]] in the context of '''[[information diagram]]s'''. In this sense we define the multivariate mutual information as follows:

+

结合信息图中的集合或度量理论，可以用条件互信息来归纳定义多元互信息。其定义表达式如下：

第273行：第268行： −

~~Where~~

+

其中

第280行：第275行： −

This definition is identical to that of [[interaction information]] except for a change in sign in the case of an odd number of random variables. A complication is that this multivariate mutual information (as well as the interaction information) can be positive, negative, or zero, which makes this quantity difficult to interpret intuitively. In fact, for <math>n</math> random variables, there are <math>2^n-1</math> degrees of freedom for how they might be correlated in an information-theoretic sense, corresponding to each non-empty subset of these variables. These degrees of freedom are bounded by various Shannon- and non-Shannon-type [[inequalities in information theory]].

+

该定义与交互信息的定义相同，只是在随机数为奇数的情况下符号发生了变化。一个复杂的问题是，该多元互信息（以及交互信息）可以是正，负或零，这使得其数量难以直观地解释。实际上，对于n个随机变量，存在2n-1个自由度。那么如何在信息理论上将它们关联，并对应于这些变量的每个非空子集，就是解决问题的关键。特别是这些自由度受到信息论中各种香农和非香农不等式的制约。

徐勇勋

2

个编辑

更改

条件互信息 (查看源代码)

2021年8月22日 (日) 20:59的版本

导航菜单

搜索