条件互信息

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
Moonscar讨论 | 贡献2020年10月27日 (二) 18:23的版本 (Moved page from wikipedia:en:Conditional mutual information (history))
跳到导航 跳到搜索

此词条暂由彩云小译翻译,翻译字数共1210,未经人工整理和审校,带来阅读不便,请见谅。

模板:Information theory


Venn diagram of information theoretic measures for three variables [math]\displaystyle{ x }[/math], [math]\displaystyle{ y }[/math], and [math]\displaystyle{ z }[/math], represented by the lower left, lower right, and upper circles, respectively. The conditional mutual informations [math]\displaystyle{ I(x;z|y) }[/math], [math]\displaystyle{ I(y;z|x) }[/math] and [math]\displaystyle{ I(x;y|z) }[/math] are represented by the yellow, cyan, and magenta regions, respectively.

[[Venn diagram of information theoretic measures for three variables [math]\displaystyle{ x }[/math], [math]\displaystyle{ y }[/math], and [math]\displaystyle{ z }[/math], represented by the lower left, lower right, and upper circles, respectively. The conditional mutual informations [math]\displaystyle{ I(x;z|y) }[/math], [math]\displaystyle{ I(y;z|x) }[/math] and [math]\displaystyle{ I(x;y|z) }[/math] are represented by the yellow, cyan, and magenta regions, respectively.]]

[三个变量的信息理论度量维恩图分别代表左下、右下和上圆。条件互信息 < math > i (x; z | y) </math > ,< math > i (y; z | x) </math > 和 < math > i (x; y | z) </math > 分别由黄色、青色和品红色区域表示。]


In probability theory, particularly information theory, the conditional mutual information[1][2] is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third.

In probability theory, particularly information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third.

在概率论理论中,特别是信息论中,条件互信息在其最基本的形式中是给定第三个值的两个随机变量的互信息的期望值。


Definition

For random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math], and [math]\displaystyle{ Z }[/math] with support sets [math]\displaystyle{ \mathcal{X} }[/math], [math]\displaystyle{ \mathcal{Y} }[/math] and [math]\displaystyle{ \mathcal{Z} }[/math], we define the conditional mutual information as

For random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math], and [math]\displaystyle{ Z }[/math] with support sets [math]\displaystyle{ \mathcal{X} }[/math], [math]\displaystyle{ \mathcal{Y} }[/math] and [math]\displaystyle{ \mathcal{Z} }[/math], we define the conditional mutual information as

对于随机变量 < math > x </math > ,< math > y </math > ,以及 < math > z </math > 和 < math > ,我们将条件互信息定义为


{{Equation box 1

{{Equation box 1

{方程式方框1

|indent =

|indent =

2012年10月22日

|title=

|title=

2012年10月11日

|equation =

|equation =

方程式 =

[math]\displaystyle{ \lt math\gt 《数学》 I(X;Y|Z) = \int_\mathcal{Z} D_{\mathrm{KL}}( P_{(X,Y)|Z} \| P_{X|Z} \otimes P_{Y|Z} ) dP_{Z} I(X;Y|Z) = \int_\mathcal{Z} D_{\mathrm{KL}}( P_{(X,Y)|Z} \| P_{X|Z} \otimes P_{Y|Z} ) dP_{Z} I (x; y | z) = int _ mathcal { z } d _ { mathrm { KL }(p _ {(x,y) | z } | p _ { x | z }/乘以 p _ { y | z }) dP _ { z } }[/math]

</math>

数学

|cellpadding= 2

|cellpadding= 2

2

|border

|border

边界

|border colour = #0073CF

|border colour = #0073CF

0073CF

|background colour=#F5FFFA}}

|background colour=#F5FFFA}}

5/fffa }}


This may be written in terms of the expectation operator: [math]\displaystyle{ I(X;Y|Z) = \mathbb{E}_Z [D_{\mathrm{KL}}( P_{(X,Y)|Z} \| P_{X|Z} \otimes P_{Y|Z} )] }[/math].

This may be written in terms of the expectation operator: [math]\displaystyle{ I(X;Y|Z) = \mathbb{E}_Z [D_{\mathrm{KL}}( P_{(X,Y)|Z} \| P_{X|Z} \otimes P_{Y|Z} )] }[/math].

这可以用期望算子来写: < math > i (x; y | z) = mathbb { e } _ z [ d _ { mathrm { KL }(p _ { x,y) | z } | p _ { x | z }/乘以 p _ { y | z })] </math > 。


Thus [math]\displaystyle{ I(X;Y|Z) }[/math] is the expected (with respect to [math]\displaystyle{ Z }[/math]) Kullback–Leibler divergence from the conditional joint distribution [math]\displaystyle{ P_{(X,Y)|Z} }[/math] to the product of the conditional marginals [math]\displaystyle{ P_{X|Z} }[/math] and [math]\displaystyle{ P_{Y|Z} }[/math]. Compare with the definition of mutual information.

Thus [math]\displaystyle{ I(X;Y|Z) }[/math] is the expected (with respect to [math]\displaystyle{ Z }[/math]) Kullback–Leibler divergence from the conditional joint distribution [math]\displaystyle{ P_{(X,Y)|Z} }[/math] to the product of the conditional marginals [math]\displaystyle{ P_{X|Z} }[/math] and [math]\displaystyle{ P_{Y|Z} }[/math]. Compare with the definition of mutual information.

因此,从条件联合分布 < math > p _ (x,y) | z } </math > 到条件边际数的乘积 < math > p _ { x | z } </math > 和 < math > p _ { y | z } </math > 。比较互信息的定义。


In terms of pmf's for discrete distributions

For discrete random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math], and [math]\displaystyle{ Z }[/math] with support sets [math]\displaystyle{ \mathcal{X} }[/math], [math]\displaystyle{ \mathcal{Y} }[/math] and [math]\displaystyle{ \mathcal{Z} }[/math], the conditional mutual information [math]\displaystyle{ I(X;Y|Z) }[/math] is as follows

For discrete random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math], and [math]\displaystyle{ Z }[/math] with support sets [math]\displaystyle{ \mathcal{X} }[/math], [math]\displaystyle{ \mathcal{Y} }[/math] and [math]\displaystyle{ \mathcal{Z} }[/math], the conditional mutual information [math]\displaystyle{ I(X;Y|Z) }[/math] is as follows

对于离散随机变量 < math > x </math > ,< math > y </math > ,和 < math > z </math > ,以及支持集 < math > mathcal { x } </math > ,< math > mathcal { y } </math > 和 < math > z } </math > ,条件互信息 < math > i (x; y | z) </math > 如下

[math]\displaystyle{ \lt math\gt 《数学》 I(X;Y|Z) = \sum_{z\in \mathcal{Z}} p_Z(z) \sum_{y\in \mathcal{Y}} \sum_{x\in \mathcal{X}} I(X;Y|Z) = \sum_{z\in \mathcal{Z}} p_Z(z) \sum_{y\in \mathcal{Y}} \sum_{x\in \mathcal{X}} I (x; y | z) = sum _ { z in mathcal { z } p _ z (z) sum _ { y in mathcal { y } sum _ { x in mathcal { x } p_{X,Y|Z}(x,y|z) \log \frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}(x|z)p_{Y|Z}(y|z)} p_{X,Y|Z}(x,y|z) \log \frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}(x|z)p_{Y|Z}(y|z)} P _ { x,y | z }(x,y | z) log frac { p _ { x,y | z }(x,y | z)}{ p _ { x | z }(x | z) p _ { y | z }(y | z)} }[/math]

</math>

数学

where the marginal, joint, and/or conditional probability mass functions are denoted by [math]\displaystyle{ p }[/math] with the appropriate subscript. This can be simplified as

where the marginal, joint, and/or conditional probability mass functions are denoted by [math]\displaystyle{ p }[/math] with the appropriate subscript. This can be simplified as

其中边际、关节和/或条件概率的质量函数用带有适当下标的 < math > p </math > 表示。这可以简化为


{{Equation box 1

{{Equation box 1

{方程式方框1

|indent =

|indent =

2012年10月22日

|title=

|title=

2012年10月11日

|equation =

|equation =

方程式 =

[math]\displaystyle{ \lt math\gt 《数学》 I(X;Y|Z) = \sum_{z\in \mathcal{Z}} \sum_{y\in \mathcal{Y}} \sum_{x\in \mathcal{X}} p_{X,Y,Z}(x,y,z) \log \frac{p_Z(z)p_{X,Y,Z}(x,y,z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}. I(X;Y|Z) = \sum_{z\in \mathcal{Z}} \sum_{y\in \mathcal{Y}} \sum_{x\in \mathcal{X}} p_{X,Y,Z}(x,y,z) \log \frac{p_Z(z)p_{X,Y,Z}(x,y,z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}. I (x; y | z) = sum { z } sum { z } sum { y in mathcal { y } sum { x } p { x,y,z }(x,y,z) log frac { p _ z (z) p { x,y,z)}(x,y,z)}{ p _ { x,z }(x,z)}(x,z) p { y,z }(y,z)}(y,z)}(y,z)}. }[/math]

</math>

数学

|cellpadding= 6

|cellpadding= 6

6

|border

|border

边界

|border colour = #0073CF

|border colour = #0073CF

0073CF

|background colour=#F5FFFA}}

|background colour=#F5FFFA}}

5/fffa }}


In terms of pdf's for continuous distributions

For (absolutely) continuous random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math], and [math]\displaystyle{ Z }[/math] with support sets [math]\displaystyle{ \mathcal{X} }[/math], [math]\displaystyle{ \mathcal{Y} }[/math] and [math]\displaystyle{ \mathcal{Z} }[/math], the conditional mutual information [math]\displaystyle{ I(X;Y|Z) }[/math] is as follows

For (absolutely) continuous random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math], and [math]\displaystyle{ Z }[/math] with support sets [math]\displaystyle{ \mathcal{X} }[/math], [math]\displaystyle{ \mathcal{Y} }[/math] and [math]\displaystyle{ \mathcal{Z} }[/math], the conditional mutual information [math]\displaystyle{ I(X;Y|Z) }[/math] is as follows

对于(绝对)连续型随机变量 < math > x </math > ,< math > y </math > ,和 < math > z </math > 与支持集 < math > mathcal { x } </math > ,< math > mathcal { y } </math > 和 < math > mathcal { z } </math > ,条件互信息 < math > i (x; y | z) </math > 如下

[math]\displaystyle{ \lt math\gt 《数学》 I(X;Y|Z) = \int_{\mathcal{Z}} \bigg( \int_{\mathcal{Y}} \int_{\mathcal{X}} I(X;Y|Z) = \int_{\mathcal{Z}} \bigg( \int_{\mathcal{Y}} \int_{\mathcal{X}} I (x; y | z) = int _ { mathcal { z } bigg (int _ { mathcal { y } int _ { mathcal { x }}) \log \left(\frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}(x|z)p_{Y|Z}(y|z)}\right) p_{X,Y|Z}(x,y|z) dx dy \bigg) p_Z(z) dz \log \left(\frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}(x|z)p_{Y|Z}(y|z)}\right) p_{X,Y|Z}(x,y|z) dx dy \bigg) p_Z(z) dz 向左(frac { x,y | z }(x,y | z)}{ p _ { x | z }(x | z) p _ { y | z }(y | z)}右) p _ { x,y | z }(x,y | z) dx dy bigg) p _ z (z) dz }[/math]

</math>

数学

where the marginal, joint, and/or conditional probability density functions are denoted by [math]\displaystyle{ p }[/math] with the appropriate subscript. This can be simplified as

where the marginal, joint, and/or conditional probability density functions are denoted by [math]\displaystyle{ p }[/math] with the appropriate subscript. This can be simplified as

其中边缘、关节和/或条件概率密度函数用带有适当下标的 < math > p </math > 表示。这可以简化为


{{Equation box 1

{{Equation box 1

{方程式方框1

|indent =

|indent =

2012年10月22日

|title=

|title=

2012年10月11日

|equation =

|equation =

方程式 =

[math]\displaystyle{ \lt math\gt 《数学》 I(X;Y|Z) = \int_{\mathcal{Z}} \int_{\mathcal{Y}} \int_{\mathcal{X}} \log \left(\frac{p_Z(z)p_{X,Y,Z}(x,y,z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}\right) p_{X,Y,Z}(x,y,z) dx dy dz. I(X;Y|Z) = \int_{\mathcal{Z}} \int_{\mathcal{Y}} \int_{\mathcal{X}} \log \left(\frac{p_Z(z)p_{X,Y,Z}(x,y,z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}\right) p_{X,Y,Z}(x,y,z) dx dy dz. I (x; y | z) = int _ { mathcal { z } int _ { mathcal { y } int _ { mathcal { x }} log left (frac { p _ z (z) p _ { x,y,z }(x,y,z)}{ p _ { x,z }(x,z }) p { y,z }(y,z)}右) p _ { x,y,z }(x,y,z) dx dy dz。 }[/math]

</math>

数学

|cellpadding= 6

|cellpadding= 6

6

|border

|border

边界

|border colour = #0073CF

|border colour = #0073CF

0073CF

|background colour=#F5FFFA}}

|background colour=#F5FFFA}}

5/fffa }}


Some identities

Alternatively, we may write in terms of joint and conditional entropies as[3]

Alternatively, we may write in terms of joint and conditional entropies as

或者,我们可以写联合和条件熵作为

[math]\displaystyle{ I(X;Y|Z) = H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z) \lt math\gt I(X;Y|Z) = H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z) \lt math \gt i (x; y | z) = h (x,z) + h (y,z)-h (x,y,z)-h (z) = H(X|Z) - H(X|Y,Z) = H(X|Z)+H(Y|Z)-H(X,Y|Z). }[/math]
               = H(X|Z) - H(X|Y,Z) = H(X|Z)+H(Y|Z)-H(X,Y|Z).</math>

= h (x | z)-h (x | y,z) = h (x | z) + h (y | z)-h (x,y | z) . </math >

This can be rewritten to show its relationship to mutual information

This can be rewritten to show its relationship to mutual information

这可以重写以显示它与互信息的关系

[math]\displaystyle{ I(X;Y|Z) = I(X;Y,Z) - I(X;Z) }[/math]

[math]\displaystyle{ I(X;Y|Z) = I(X;Y,Z) - I(X;Z) }[/math]

I (x; y | z) = i (x; y,z)-i (x; z) </math >

usually rearranged as the chain rule for mutual information

usually rearranged as the chain rule for mutual information

通常重新排列,作为互信息的链式规则

[math]\displaystyle{ I(X;Y,Z) = I(X;Z) + I(X;Y|Z) }[/math]

[math]\displaystyle{ I(X;Y,Z) = I(X;Z) + I(X;Y|Z) }[/math]

I (x; y,z) = i (x; z) + i (x; y | z) </math >

Another equivalent form of the above is[4]

Another equivalent form of the above is

另一个类似的形式是

[math]\displaystyle{ I(X;Y|Z) = H(Z|X) + H(X) + H(Z|Y) + H(Y) - H(Z|X,Y) - H(X,Y) - H(Z) \lt math\gt I(X;Y|Z) = H(Z|X) + H(X) + H(Z|Y) + H(Y) - H(Z|X,Y) - H(X,Y) - H(Z) \lt math \gt i (x; y | z) = h (z | x) + h (x) + h (z | y) + h (y)-h (z | x,y)-h (x,y)-h (z) = I(X;Y) + H(Z|X) + H(Z|Y) - H(Z|X,Y) - H(Z) }[/math]
               = I(X;Y) + H(Z|X) + H(Z|Y) - H(Z|X,Y) - H(Z)</math>

= i (x; y) + h (z | x) + h (z | y)-h (z | x,y)-h (z) </math >


Like mutual information, conditional mutual information can be expressed as a Kullback–Leibler divergence:

Like mutual information, conditional mutual information can be expressed as a Kullback–Leibler divergence:

与互信息一样,条件互信息也可以表示为 Kullback-Leibler 背离:


[math]\displaystyle{ I(X;Y|Z) = D_{\mathrm{KL}}[ p(X,Y,Z) \| p(X|Z)p(Y|Z)p(Z) ]. }[/math]

[math]\displaystyle{ I(X;Y|Z) = D_{\mathrm{KL}}[ p(X,Y,Z) \| p(X|Z)p(Y|Z)p(Z) ]. }[/math]

< math > i (x; y | z) = d _ { mathrm { KL }[ p (x,y,z) | p (x | z) p (y | z) p (z)].数学


Or as an expected value of simpler Kullback–Leibler divergences:

Or as an expected value of simpler Kullback–Leibler divergences:

或者作为一个更简单的 Kullback-Leibler 分歧的期望值:

[math]\displaystyle{ I(X;Y|Z) = \sum_{z \in \mathcal{Z}} p( Z=z ) D_{\mathrm{KL}}[ p(X,Y|z) \| p(X|z)p(Y|z) ] }[/math],

[math]\displaystyle{ I(X;Y|Z) = \sum_{z \in \mathcal{Z}} p( Z=z ) D_{\mathrm{KL}}[ p(X,Y|z) \| p(X|z)p(Y|z) ] }[/math],

(x,y | z) = sum _ { z } p (z = z) d _ { mathrm { KL }[ p (x,y | z) | p (x | z) p (y | z)] </math > ,

[math]\displaystyle{ I(X;Y|Z) = \sum_{y \in \mathcal{Y}} p( Y=y ) D_{\mathrm{KL}}[ p(X,Z|y) \| p(X|Z)p(Z|y) ] }[/math].

[math]\displaystyle{ I(X;Y|Z) = \sum_{y \in \mathcal{Y}} p( Y=y ) D_{\mathrm{KL}}[ p(X,Z|y) \| p(X|Z)p(Z|y) ] }[/math].

[ math > i (x; y | z) = sum _ { y } p (y = y) d _ { mathrm { KL }[ p (x,z | y) | p (x | z) p (z | y)] </math > .


More general definition

A more general definition of conditional mutual information, applicable to random variables with continuous or other arbitrary distributions, will depend on the concept of regular conditional probability. (See also.[5][6])

A more general definition of conditional mutual information, applicable to random variables with continuous or other arbitrary distributions, will depend on the concept of regular conditional probability. (See also.)

条件互信息的更一般的定义,适用于具有连续或其他任意分布的随机变量,将取决于正则条件概率的概念。(另见。)


Let [math]\displaystyle{ (\Omega, \mathcal F, \mathfrak P) }[/math] be a probability space, and let the random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math], and [math]\displaystyle{ Z }[/math] each be defined as a Borel-measurable function from [math]\displaystyle{ \Omega }[/math] to some state space endowed with a topological structure.

Let [math]\displaystyle{ (\Omega, \mathcal F, \mathfrak P) }[/math] be a probability space, and let the random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math], and [math]\displaystyle{ Z }[/math] each be defined as a Borel-measurable function from [math]\displaystyle{ \Omega }[/math] to some state space endowed with a topological structure.

设数学为概率空间,随机变量 x 为数学,y 为数学,z 为数学,每个变量都被定义为一个 borel-measured 函数,从 math 到某个具有拓扑结构的状态空间。


Consider the Borel measure (on the σ-algebra generated by the open sets) in the state space of each random variable defined by assigning each Borel set the [math]\displaystyle{ \mathfrak P }[/math]-measure of its preimage in [math]\displaystyle{ \mathcal F }[/math]. This is called the pushforward measure [math]\displaystyle{ X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big). }[/math] The support of a random variable is defined to be the topological support of this measure, i.e. [math]\displaystyle{ \mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P. }[/math]

Consider the Borel measure (on the σ-algebra generated by the open sets) in the state space of each random variable defined by assigning each Borel set the [math]\displaystyle{ \mathfrak P }[/math]-measure of its preimage in [math]\displaystyle{ \mathcal F }[/math]. This is called the pushforward measure [math]\displaystyle{ X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big). }[/math] The support of a random variable is defined to be the topological support of this measure, i.e. [math]\displaystyle{ \mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P. }[/math]

考虑在每个随机变量的状态空间中的 Borel 测度(在由开集生成的 σ- 代数上) ,通过赋予每个 Borel 集在 < math > mathfrak </math > 中对其前映象的测度。这被称为前推测度 < math > x _ * mathfrak p = mathfrak p big (x ^ {-1}(cdot) big)。一个随机变量的支持被定义为这个度量的拓扑支持,即。数学,数学,数学,数学


Now we can formally define the conditional probability measure given the value of one (or, via the product topology, more) of the random variables. Let [math]\displaystyle{ M }[/math] be a measurable subset of [math]\displaystyle{ \Omega, }[/math] (i.e. [math]\displaystyle{ M \in \mathcal F, }[/math]) and let [math]\displaystyle{ x \in \mathrm{supp}\,X. }[/math] Then, using the disintegration theorem:

Now we can formally define the conditional probability measure given the value of one (or, via the product topology, more) of the random variables. Let [math]\displaystyle{ M }[/math] be a measurable subset of [math]\displaystyle{ \Omega, }[/math] (i.e. [math]\displaystyle{ M \in \mathcal F, }[/math]) and let [math]\displaystyle{ x \in \mathrm{supp}\,X. }[/math] Then, using the disintegration theorem:

现在,我们可以正式定义给定一个(或者,通过条件概率积空间,更多)的随机变量的值。让 < math > m </math > 成为 < math > Omega </math > 的可测子集。然后,使用瓦解定理:

[math]\displaystyle{ \mathfrak P(M | X=x) = \lim_{U \ni x} \lt math\gt \mathfrak P(M | X=x) = \lim_{U \ni x} \lt math \gt mathfrak p (m | x = x) = lim { u ni x } \frac {\mathfrak P(M \cap \{X \in U\})} \frac {\mathfrak P(M \cap \{X \in U\})} (m cap { x in u })} {\mathfrak P(\{X \in U\})} {\mathfrak P(\{X \in U\})} { mathfrak p ({ x in u })} \qquad \textrm{and} \qquad \mathfrak P(M|X) = \int_M d\mathfrak P\big(\omega|X=X(\omega)\big), }[/math]
 \qquad \textrm{and} \qquad \mathfrak P(M|X) = \int_M d\mathfrak P\big(\omega|X=X(\omega)\big),</math>

(m | x) = int _ m d mathfrak p big (omega | x = x (omega) big) ,</math >

where the limit is taken over the open neighborhoods [math]\displaystyle{ U }[/math] of [math]\displaystyle{ x }[/math], as they are allowed to become arbitrarily smaller with respect to set inclusion.

where the limit is taken over the open neighborhoods [math]\displaystyle{ U }[/math] of [math]\displaystyle{ x }[/math], as they are allowed to become arbitrarily smaller with respect to set inclusion.

这里的限制取决于开放的社区 < math > u </math > of < math > x </math > ,因为它们被允许随意变小来设置包含性。


Finally we can define the conditional mutual information via Lebesgue integration:

Finally we can define the conditional mutual information via Lebesgue integration:

最后,我们可以通过勒贝格积分定义条件互信息:

[math]\displaystyle{ I(X;Y|Z) = \int_\Omega \log \lt math\gt I(X;Y|Z) = \int_\Omega \log \lt math \gt i (x; y | z) = int _ Omega log \Bigl( \Bigl( Bigl ( \frac {d \mathfrak P(\omega|X,Z)\, d\mathfrak P(\omega|Y,Z)} \frac {d \mathfrak P(\omega|X,Z)\, d\mathfrak P(\omega|Y,Z)} (omega | x,z) ,d mathfrak p (omega | y,z)} {d \mathfrak P(\omega|Z)\, d\mathfrak P(\omega|X,Y,Z)} {d \mathfrak P(\omega|Z)\, d\mathfrak P(\omega|X,Y,Z)} { d mathfrak p (omega | z) ,d mathfrak p (omega | x,y,z)} \Bigr) \Bigr) Bigr) d \mathfrak P(\omega), d \mathfrak P(\omega), D mathfrak p (omega) , }[/math]
 </math>

数学

where the integrand is the logarithm of a Radon–Nikodym derivative involving some of the conditional probability measures we have just defined.

where the integrand is the logarithm of a Radon–Nikodym derivative involving some of the conditional probability measures we have just defined.

其中被积函数是 Radon-Nikodym 导数的对数,它包含了我们刚才定义的一些条件概率测度。


Note on notation

In an expression such as [math]\displaystyle{ I(A;B|C), }[/math] [math]\displaystyle{ A, }[/math] [math]\displaystyle{ B, }[/math] and [math]\displaystyle{ C }[/math] need not necessarily be restricted to representing individual random variables, but could also represent the joint distribution of any collection of random variables defined on the same probability space. As is common in probability theory, we may use the comma to denote such a joint distribution, e.g. [math]\displaystyle{ I(A_0,A_1;B_1,B_2,B_3|C_0,C_1). }[/math] Hence the use of the semicolon (or occasionally a colon or even a wedge [math]\displaystyle{ \wedge }[/math]) to separate the principal arguments of the mutual information symbol. (No such distinction is necessary in the symbol for joint entropy, since the joint entropy of any number of random variables is the same as the entropy of their joint distribution.)

In an expression such as [math]\displaystyle{ I(A;B|C), }[/math] [math]\displaystyle{ A, }[/math] [math]\displaystyle{ B, }[/math] and [math]\displaystyle{ C }[/math] need not necessarily be restricted to representing individual random variables, but could also represent the joint distribution of any collection of random variables defined on the same probability space. As is common in probability theory, we may use the comma to denote such a joint distribution, e.g. [math]\displaystyle{ I(A_0,A_1;B_1,B_2,B_3|C_0,C_1). }[/math] Hence the use of the semicolon (or occasionally a colon or even a wedge [math]\displaystyle{ \wedge }[/math]) to separate the principal arguments of the mutual information symbol. (No such distinction is necessary in the symbol for joint entropy, since the joint entropy of any number of random variables is the same as the entropy of their joint distribution.)

在诸如 < math > i (a; b | c) ,</math > < math > a,</math > < math > b,</math > 和 < math > c </math > 这样的表达式中,不一定要局限于表示单个的随机变量,但也可以表示在同一个概率空间上定义的任何随机变量集合的联合分布。正如在概率论中常见的那样,我们可以用逗号来表示这种联合分布,例如:。(a _ 0,a _ 1; b _ 1,b _ 2,b _ 3 | c _ 0,c _ 1).因此,使用分号(或者偶尔使用冒号,甚至楔子)来分隔互信息符号的主要参数。(在联合熵的符号中不需要这样的区分,因为任意数目的随机变量的联合熵等于它们联合分布的联合熵。)


Properties

Nonnegativity

It is always true that

It is always true that

事实总是如此

[math]\displaystyle{ I(X;Y|Z) \ge 0 }[/math],

[math]\displaystyle{ I(X;Y|Z) \ge 0 }[/math],

[ math ] i (x; y | z) ge 0,

for discrete, jointly distributed random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ Z }[/math]. This result has been used as a basic building block for proving other inequalities in information theory, in particular, those known as Shannon-type inequalities. Conditional mutual information is also non-negative for continuous random variables under certain regularity conditions.[7]

for discrete, jointly distributed random variables [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ Z }[/math]. This result has been used as a basic building block for proving other inequalities in information theory, in particular, those known as Shannon-type inequalities. Conditional mutual information is also non-negative for continuous random variables under certain regularity conditions.

对于离散的,联合分布的随机变量 x </math > ,< math > y </math > 和 < math > 。这个结果被用来作为证明信息论中其他不等式的基本构件,特别是那些被称为香农型不等式的不等式。在一定的正则性条件下,连续型随机变量的条件互信息也是非负的。


Interaction information

Conditioning on a third random variable may either increase or decrease the mutual information: that is, the difference [math]\displaystyle{ I(X;Y) - I(X;Y|Z) }[/math], called the interaction information, may be positive, negative, or zero. This is the case even when random variables are pairwise independent. Such is the case when: [math]\displaystyle{ X \sim \mathrm{Bernoulli}(0.5), Z \sim \mathrm{Bernoulli}(0.5), \quad Y=\left\{\begin{array}{ll} X & \text{if }Z=0\\ 1-X & \text{if }Z=1 \end{array}\right. }[/math]in which case [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ Z }[/math] are pairwise independent and in particular [math]\displaystyle{ I(X;Y)=0 }[/math], but [math]\displaystyle{ I(X;Y|Z)=1. }[/math]

Conditioning on a third random variable may either increase or decrease the mutual information: that is, the difference [math]\displaystyle{ I(X;Y) - I(X;Y|Z) }[/math], called the interaction information, may be positive, negative, or zero. This is the case even when random variables are pairwise independent. Such is the case when: [math]\displaystyle{ X \sim \mathrm{Bernoulli}(0.5), Z \sim \mathrm{Bernoulli}(0.5), \quad Y=\left\{\begin{array}{ll} X & \text{if }Z=0\\ 1-X & \text{if }Z=1 \end{array}\right. }[/math]in which case [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ Z }[/math] are pairwise independent and in particular [math]\displaystyle{ I(X;Y)=0 }[/math], but [math]\displaystyle{ I(X;Y|Z)=1. }[/math]

对第三个随机变量的条件作用可能增加或减少互信息: 也就是说,差值 < math > i (x; y)-i (x; y | z) </math > ,称为交互信息,可能是正的、负的或零的。即使随机变量是成对独立的,情况也是如此。例如: < math display = " block" > x sim mathrum { Bernoulli }(0.5) ,z sim mathrum { Bernoulli }(0.5) ,quad y = left { begin { array }{ ll } x & text { if } z = 01-x & text { if } z = 1 end { array }右。在这种情况下,x </math > ,< math > y </math > 和 < math > z </math > 是成对独立的,特别是 < math > i (x; y) = 0 </math > ,但 < math > i (x; y | z) = 1。数学


Chain rule for mutual information

[math]\displaystyle{ I(X;Y,Z) = I(X;Z) + I(X;Y|Z) }[/math]

[math]\displaystyle{ I(X;Y,Z) = I(X;Z) + I(X;Y|Z) }[/math]

I (x; y,z) = i (x; z) + i (x; y | z) </math >


Multivariate mutual information

The conditional mutual information can be used to inductively define a multivariate mutual information in a set- or measure-theoretic sense in the context of information diagrams. In this sense we define the multivariate mutual information as follows:

The conditional mutual information can be used to inductively define a multivariate mutual information in a set- or measure-theoretic sense in the context of information diagrams. In this sense we define the multivariate mutual information as follows:

在信息图的背景下,条件互信息可以用集合或测度理论的方法归纳定义多元互信息。在这个意义上,我们将多元互信息定义如下:

[math]\displaystyle{ I(X_1;\ldots;X_{n+1}) = I(X_1;\ldots;X_n) - I(X_1;\ldots;X_n|X_{n+1}), }[/math]

[math]\displaystyle{ I(X_1;\ldots;X_{n+1}) = I(X_1;\ldots;X_n) - I(X_1;\ldots;X_n|X_{n+1}), }[/math]

I (x _ 1; ldots; x _ { n + 1}) = i (x _ 1; ldots; x _ n)-i (x _ 1; ldots; x _ n | x _ { n + 1}) ,</math >

where

where

在哪里

[math]\displaystyle{ I(X_1;\ldots;X_n|X_{n+1}) = \mathbb{E}_{X_{n+1}} [D_{\mathrm{KL}}( P_{(X_1,\ldots,X_n)|X_{n+1}} \| P_{X_1|X_{n+1}} \otimes\cdots\otimes P_{X_n|X_{n+1}} )]. }[/math]

[math]\displaystyle{ I(X_1;\ldots;X_n|X_{n+1}) = \mathbb{E}_{X_{n+1}} [D_{\mathrm{KL}}( P_{(X_1,\ldots,X_n)|X_{n+1}} \| P_{X_1|X_{n+1}} \otimes\cdots\otimes P_{X_n|X_{n+1}} )]. }[/math]

< math > i (x _ 1; ldots; x _ n | x _ { n + 1}) = mathbb { x _ { n + 1}[ d _ { mathrm { KL }(p _ { x _ 1,ldots,x _ n) | x _ n + 1} | p _ { x _ 1 | x _ n + 1} o times p _ { x _ n | x _ { n + 1}}] </math >

This definition is identical to that of interaction information except for a change in sign in the case of an odd number of random variables. A complication is that this multivariate mutual information (as well as the interaction information) can be positive, negative, or zero, which makes this quantity difficult to interpret intuitively. In fact, for [math]\displaystyle{ n }[/math] random variables, there are [math]\displaystyle{ 2^n-1 }[/math] degrees of freedom for how they might be correlated in an information-theoretic sense, corresponding to each non-empty subset of these variables. These degrees of freedom are bounded by various Shannon- and non-Shannon-type inequalities in information theory.

This definition is identical to that of interaction information except for a change in sign in the case of an odd number of random variables. A complication is that this multivariate mutual information (as well as the interaction information) can be positive, negative, or zero, which makes this quantity difficult to interpret intuitively. In fact, for [math]\displaystyle{ n }[/math] random variables, there are [math]\displaystyle{ 2^n-1 }[/math] degrees of freedom for how they might be correlated in an information-theoretic sense, corresponding to each non-empty subset of these variables. These degrees of freedom are bounded by various Shannon- and non-Shannon-type inequalities in information theory.

这个定义与交互信息的定义相同,除了在奇数个随机变量的情况下,符号发生了变化。一个复杂的问题是,这个多元互信息(以及交互信息)可以是正的、负的或零的,这使得这个量很难直观地解释。事实上,对于 < math > n </math > 随机变量,在信息论意义上,对于它们如何相关,有 < math > 2 ^ n-1 </math > 自由度,对应于这些变量的每个非空子集。这些自由度受到信息论中各种 Shannon 型和非 Shannon 型不等式的限制。


References

  1. Wyner, A. D. (1978). "A definition of conditional mutual information for arbitrary ensembles". Information and Control. 38 (1): 51–59. doi:10.1016/s0019-9958(78)90026-8.
  2. Dobrushin, R. L. (1959). "General formulation of Shannon's main theorem in information theory". Uspekhi Mat. Nauk. 14: 3–104.
  3. Cover, Thomas; Thomas, Joy A. (2006). Elements of Information Theory (2nd ed.). New York: Wiley-Interscience. ISBN 0-471-24195-4. 
  4. Decomposition on Math.StackExchange
  5. Regular Conditional Probability on PlanetMath
  6. D. Leao, Jr. et al. Regular conditional probability, disintegration of probability and Radon spaces. Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile PDF
  7. Polyanskiy, Yury; Wu, Yihong (2017). Lecture notes on information theory. p. 30. http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf. 

Category:Information theory

范畴: 信息论

Category:Entropy and information

类别: 熵和信息


This page was moved from wikipedia:en:Conditional mutual information. Its edit history can be viewed at 条件互信息/edithistory