更改

添加2,019字节 、 2020年12月13日 (日) 19:50
无编辑摘要
第1行: 第1行: −
此词条Jie翻译。
+
此词条Jie翻译,由Flipped完成审校.
    
{{Information theory}}
 
{{Information theory}}
   −
[[文件:VennInfo3Var.svg|缩略图|右|以上是三个变量<math>x</math>, <math>y</math>, 和 <math>z</math>信息理论测度的维恩图,分别由左下,右下和上部的圆圈表示。条件互信息<math>I(x;z|y)</math>, <math>I(y;z|x)</math> 和 <math>I(x;y|z)</math>分别由黄色,青色和品红色('''注意:该图颜色标注错误,需要修改''')区域表示。]]
+
[[文件:VennInfo3Var.svg|缩略图|右|以上是三个变量<math>x</math>, <math>y</math>, 和 <math>z</math>信息理论测度的维恩图,分别由左下,右下和上部的圆圈表示。条件互信息<math>I(x;z|y)</math>, <math>I(y;z|x)</math> 和 <math>I(x;y|z)</math>分别由黄色,青色和品红色(''' <font color="#32cd32"> 注意:该图颜色标注错误,需要修改</font> ''')区域表示。]]
    
In [[probability theory]], particularly [[information theory]], the '''conditional mutual information'''<ref name = Wyner1978>{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|url=|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}</ref><ref name = Dobrushin1959>{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon's main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}}</ref> is, in its most basic form, the [[expected value]] of the [[mutual information]] of two random variables given the value of a third.
 
In [[probability theory]], particularly [[information theory]], the '''conditional mutual information'''<ref name = Wyner1978>{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|url=|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}</ref><ref name = Dobrushin1959>{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon's main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}}</ref> is, in its most basic form, the [[expected value]] of the [[mutual information]] of two random variables given the value of a third.
   −
在'''<font color="#ff8000"> 概率论Probability theory</font>'''中,特别是与'''<font color="#ff8000"> 信息论Information theory</font>'''相关的情况下,最基本形式的'''<font color="#ff8000"> 条件互信息Conditional mutual information </font>''',是在给定第三个值的两个随机变量间互信息的期望值。
+
在'''<font color="#ff8000"> 概率论Probability theory</font>'''中,特别是与'''<font color="#ff8000"> 信息论Information theory</font>'''相关的情况下,最基本形式的'''<font color="#ff8000"> 条件互信息Conditional mutual information </font>'''<ref name = Wyner1978>{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|url=|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}</ref><ref name = Dobrushin1959>{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon's main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}}</ref>,是在给定第三个值的两个随机变量间互信息的期望值。
      第14行: 第14行:  
For random variables <math>X</math>, <math>Y</math>, and <math>Z</math> with [[Support (mathematics)|support sets]] <math>\mathcal{X}</math>, <math>\mathcal{Y}</math> and <math>\mathcal{Z}</math>, we define the conditional mutual information as
 
For random variables <math>X</math>, <math>Y</math>, and <math>Z</math> with [[Support (mathematics)|support sets]] <math>\mathcal{X}</math>, <math>\mathcal{Y}</math> and <math>\mathcal{Z}</math>, we define the conditional mutual information as
   −
对于具有支持集<math>\mathcal{X}</math>, <math>\mathcal{Y}</math> 和 <math>\mathcal{Z}</math>的随机变量<math>X</math>, <math>Y</math>, 和 <math>Z</math>,我们将条件互信息定义为:
+
对于具有'''<font color="#ff8000"> 支持集Probability theory</font>''' <math>\mathcal{X}</math>, <math>\mathcal{Y}</math> 和 <math>\mathcal{Z}</math>的随机变量<math>X</math>, <math>Y</math>和 <math>Z</math>,我们将条件互信息定义为:
      第40行: 第40行:  
Thus <math>I(X;Y|Z)</math> is the expected (with respect to <math>Z</math>) [[Kullback–Leibler divergence]] from the conditional joint distribution <math>P_{(X,Y)|Z}</math> to the product of the conditional marginals <math>P_{X|Z}</math> and <math>P_{Y|Z}</math>. Compare with the definition of [[mutual information]].
 
Thus <math>I(X;Y|Z)</math> is the expected (with respect to <math>Z</math>) [[Kullback–Leibler divergence]] from the conditional joint distribution <math>P_{(X,Y)|Z}</math> to the product of the conditional marginals <math>P_{X|Z}</math> and <math>P_{Y|Z}</math>. Compare with the definition of [[mutual information]].
   −
因此,相较于互信息的定义,<math>I(X;Y|Z)</math>可以表达为期望的'''<font color="#ff8000"> Kullback-Leibler散度</font>'''(相对于<math>Z</math>),即从条件联合分布<math>P_{(X,Y)|Z}</math>到条件边际<math>P_{X|Z}</math> 和 <math>P_{Y|Z}</math>的乘积。
+
因此,相较于互信息的定义,<math>I(X;Y|Z)</math>可以表达为期望的'''<font color="#ff8000"> Kullback-Leibler散度 Kullback–Leibler divergence </font>'''(相对于<math>Z</math>),即从条件联合分布<math>P_{(X,Y)|Z}</math>到条件边际<math>P_{X|Z}</math> 和 <math>P_{Y|Z}</math>的乘积。
      第59行: 第59行:  
where the marginal, joint, and/or conditional [[probability mass function]]s are denoted by <math>p</math> with the appropriate subscript. This can be simplified as
 
where the marginal, joint, and/or conditional [[probability mass function]]s are denoted by <math>p</math> with the appropriate subscript. This can be simplified as
   −
其中边缘概率密度函数,联合概率密度函数,和(或)条件概率质量函数可以由<math>p</math>加上适当的下标表示。这可以简化为:
+
其中边缘概率质量函数,联合概率质量函数,和(或)条件'''<font color="#ff8000">概率质量函数 probability mass function </font>'''可以由<math>p</math>加上适当的下标表示。这可以简化为:
      第91行: 第91行:  
where the marginal, joint, and/or conditional [[probability density function]]s are denoted by <math>p</math> with the appropriate subscript. This can be simplified as
 
where the marginal, joint, and/or conditional [[probability density function]]s are denoted by <math>p</math> with the appropriate subscript. This can be simplified as
   −
其中边缘概率密度函数,联合概率密度函数,和(或)条件概率密度函数可以由p加上适当的下标表示。这可以简化为
+
其中边缘概率密度函数,联合概率密度函数,和(或)条件'''<font color="#ff8000">概率密度函数 probability density function </font>'''可以由p加上适当的下标表示。这可以简化为
      第111行: 第111行:  
Alternatively, we may write in terms of joint and conditional [[Entropy (information theory)|entropies]] as<ref>{{cite book |last1=Cover |first1=Thomas |author-link1=Thomas M. Cover |last2=Thomas |first2=Joy A. |title=Elements of Information Theory |edition=2nd |location=New York |publisher=[[Wiley-Interscience]] |date=2006 |isbn=0-471-24195-4}}</ref>
 
Alternatively, we may write in terms of joint and conditional [[Entropy (information theory)|entropies]] as<ref>{{cite book |last1=Cover |first1=Thomas |author-link1=Thomas M. Cover |last2=Thomas |first2=Joy A. |title=Elements of Information Theory |edition=2nd |location=New York |publisher=[[Wiley-Interscience]] |date=2006 |isbn=0-471-24195-4}}</ref>
   −
同时我们也可以将联合和条件熵写为:
+
同时我们也可以将联合和条件熵写为<ref>{{cite book |last1=Cover |first1=Thomas |author-link1=Thomas M. Cover |last2=Thomas |first2=Joy A. |title=Elements of Information Theory |edition=2nd |location=New York |publisher=[[Wiley-Interscience]] |date=2006 |isbn=0-471-24195-4}}</ref>:
      第136行: 第136行:  
Another equivalent form of the above is<ref>[https://math.stackexchange.com/q/1863993 Decomposition on Math.StackExchange]</ref>
 
Another equivalent form of the above is<ref>[https://math.stackexchange.com/q/1863993 Decomposition on Math.StackExchange]</ref>
   −
上述的另一种等效形式是:
+
上述式子的另一种等价形式是<ref>[https://math.stackexchange.com/q/1863993 Decomposition on Math.StackExchange]</ref>:
      第153行: 第153行:  
Or as an expected value of simpler Kullback–Leibler divergences:
 
Or as an expected value of simpler Kullback–Leibler divergences:
   −
或作为更简单的Kullback-Leibler差异的期望值:
+
或作为更简单的Kullback-Leibler散度的期望值:
      第161行: 第161行:       −
== More general definition 其他定义==
+
== More general definition 其他通用定义==
 
A more general definition of conditional mutual information, applicable to random variables with continuous or other arbitrary distributions, will depend on the concept of '''[[regular conditional probability]]'''.  (See also. <ref>[http://planetmath.org/encyclopedia/ConditionalProbabilityMeasure.html Regular Conditional Probability] on [http://planetmath.org/ PlanetMath]</ref><ref>D. Leao, Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref>)
 
A more general definition of conditional mutual information, applicable to random variables with continuous or other arbitrary distributions, will depend on the concept of '''[[regular conditional probability]]'''.  (See also. <ref>[http://planetmath.org/encyclopedia/ConditionalProbabilityMeasure.html Regular Conditional Probability] on [http://planetmath.org/ PlanetMath]</ref><ref>D. Leao, Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref>)
   −
条件互信息的其他通用定义(适用于具有连续或其他任意分布的随机变量)将取决于正则条件概率的概念。
+
条件互信息的其他通用定义(适用于具有连续或其他任意分布的随机变量)将取决于'''<font color="#ff8000"> 正则条件概率 regular conditional probability </font>'''的概念。(参阅<ref>[http://planetmath.org/encyclopedia/ConditionalProbabilityMeasure.html Regular Conditional Probability] on [http://planetmath.org/ PlanetMath]</ref><ref>D. Leao, Jr. et al. ''Regular conditional probability, disintegration of probability and Radon spaces.'' Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]</ref>))
      第170行: 第170行:  
Let <math>(\Omega, \mathcal F, \mathfrak P)</math> be a [[probability space]], and let the random variables <math>X</math>, <math>Y</math>, and <math>Z</math> each be defined as a Borel-measurable function from <math>\Omega</math> to some state space endowed with a topological structure.
 
Let <math>(\Omega, \mathcal F, \mathfrak P)</math> be a [[probability space]], and let the random variables <math>X</math>, <math>Y</math>, and <math>Z</math> each be defined as a Borel-measurable function from <math>\Omega</math> to some state space endowed with a topological structure.
   −
令<math>(\Omega, \mathcal F, \mathfrak P)</math>为一个概率空间,并将随机变量<math>X</math>, <math>Y</math>, 和 <math>Z</math>分别定义为一个从<math>\Omega</math>到具有拓扑结构的状态空间的'''<font color="#ff8000"> 波莱尔可测函数Borel-measurable function </font>'''。
+
令<math>(\Omega, \mathcal F, \mathfrak P)</math>为一个'''<font color="#ff8000"> 概率空间 probability space </font>''',并将随机变量<math>X</math>, <math>Y</math>和 <math>Z</math>分别定义为一个从<math>\Omega</math>到具有拓扑结构的状态空间的'''<font color="#ff8000"> 波莱尔可测函数Borel-measurable function </font>'''。
      第176行: 第176行:  
Consider the Borel measure (on the σ-algebra generated by the open sets) in the state space of each random variable defined by assigning each Borel set the <math>\mathfrak P</math>-measure of its preimage in <math>\mathcal F</math>.  This is called the [[pushforward measure]] <math>X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big).</math>  The '''support of a random variable''' is defined to be the [[Support (measure theory)|topological support]] of this measure, i.e. <math>\mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P.</math>
 
Consider the Borel measure (on the σ-algebra generated by the open sets) in the state space of each random variable defined by assigning each Borel set the <math>\mathfrak P</math>-measure of its preimage in <math>\mathcal F</math>.  This is called the [[pushforward measure]] <math>X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big).</math>  The '''support of a random variable''' is defined to be the [[Support (measure theory)|topological support]] of this measure, i.e. <math>\mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P.</math>
   −
考虑到在每个随机变量状态空间中的'''<font color="#ff8000"> 波莱尔测度Borel measure</font>'''(关于开放集生成的σ代数),是由每个波莱尔集分配到的<math>\mathcal F</math>中的原像<math>\mathfrak P</math>测度来确定的。这被称为'''<font color="#ff8000"> 前推测度Pushforward measure </font>''' <math>X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big).</math>。随机变量的支撑集定义为该测度的拓扑支撑集,即<math>\mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P.</math>。
+
考虑到在每个随机变量状态空间中的'''<font color="#ff8000"> 波莱尔测度Borel measure</font>'''(关于开放集生成的σ代数),是由<math>\mathcal F</math>中每个波莱尔集分配到的的原像<math>\mathfrak P</math>测度来确定的。这被称为'''<font color="#ff8000"> 前推测度  Pushforward measure </font>''' <math>X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big).</math>。随机变量的支撑集定义为该测度的拓扑支撑集,即<math>\mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P.</math>。
      第182行: 第182行:  
Now we can formally define the [[conditional probability distribution|conditional probability measure]] given the value of one (or, via the [[product topology]], more) of the random variables.  Let <math>M</math> be a measurable subset of <math>\Omega,</math> (i.e. <math>M \in \mathcal F,</math>) and let <math>x \in \mathrm{supp}\,X.</math>  Then, using the [[disintegration theorem]]:
 
Now we can formally define the [[conditional probability distribution|conditional probability measure]] given the value of one (or, via the [[product topology]], more) of the random variables.  Let <math>M</math> be a measurable subset of <math>\Omega,</math> (i.e. <math>M \in \mathcal F,</math>) and let <math>x \in \mathrm{supp}\,X.</math>  Then, using the [[disintegration theorem]]:
   −
现在,我们可以在给定其中一个随机变量值(或通过积拓扑获得更多)的情况下正式定义条件概率测度。令<math>M</math>为<math>\Omega,</math>的可测子集(即<math>M \in \mathcal F,</math>,),令<math>x \in \mathrm{supp}\,X.</math>。然后,使用分解定理:
+
现在,我们可以在给定其中一个随机变量值(或通过'''<font color="#ff8000"> 积拓扑  product topology </font>'''获得更多)的情况下正式定义'''<font color="#ff8000"> 条件概率测度  conditional probability distribution|conditional probability measure </font>'''。令<math>M</math>为<math>\Omega</math>的可测子集(即<math>M \in \mathcal F,</math>),令<math>x \in \mathrm{supp}\,X</math>。然后,使用'''<font color="#ff8000"> 分解定理  disintegration theorem </font>''':
      第193行: 第193行:  
where the limit is taken over the open neighborhoods <math>U</math> of <math>x</math>, as they are allowed to become arbitrarily smaller with respect to [[Subset|set inclusion]].
 
where the limit is taken over the open neighborhoods <math>U</math> of <math>x</math>, as they are allowed to become arbitrarily smaller with respect to [[Subset|set inclusion]].
   −
在<math>x</math>的开放邻域<math>U</math>处采用极限,因为相对于'''<font color="#ff8000"> 集包含Set inclusion</font>''',它们可以任意变小。
+
在<math>x</math>的开放邻域<math>U</math>处取极限,因为相对于'''<font color="#ff8000"> 集包含Set inclusion</font>''',它们可以任意变小。
      第220行: 第220行:  
In an expression such as <math>I(A;B|C),</math> <math>A,</math> <math>B,</math> and <math>C</math> need not necessarily be restricted to representing individual random variables, but could also represent the joint distribution of any collection of random variables defined on the same [[probability space]].  As is common in [[probability theory]], we may use the comma to denote such a joint distribution, e.g. <math>I(A_0,A_1;B_1,B_2,B_3|C_0,C_1).</math>  Hence the use of the semicolon (or occasionally a colon or even a wedge <math>\wedge</math>) to separate the principal arguments of the mutual information symbol.  (No such distinction is necessary in the symbol for [[joint entropy]], since the joint entropy of any number of random variables is the same as the entropy of their joint distribution.)
 
In an expression such as <math>I(A;B|C),</math> <math>A,</math> <math>B,</math> and <math>C</math> need not necessarily be restricted to representing individual random variables, but could also represent the joint distribution of any collection of random variables defined on the same [[probability space]].  As is common in [[probability theory]], we may use the comma to denote such a joint distribution, e.g. <math>I(A_0,A_1;B_1,B_2,B_3|C_0,C_1).</math>  Hence the use of the semicolon (or occasionally a colon or even a wedge <math>\wedge</math>) to separate the principal arguments of the mutual information symbol.  (No such distinction is necessary in the symbol for [[joint entropy]], since the joint entropy of any number of random variables is the same as the entropy of their joint distribution.)
   −
在诸如<math>I(A;B|C)</math>的表达式中,<math>A</math> <math>B</math> 和 <math>C</math>不限于表示单个随机变量,它们同时可以表示在同一概率空间上定义的任意随机变量集合的联合分布。类似概率论中的表达方式,我们可以使用逗号来表示这种联合分布,例如<math>I(A_0,A_1;B_1,B_2,B_3|C_0,C_1).</math>。因此,使用分号(或有时用冒号或楔形<math>\wedge</math>)来分隔互信息符号的主要参数。(在联合熵的符号中,不需要作这样的区分,因为任意数量随机变量的'''<font color="#ff8000"> 联合熵Joint entropy</font>'''与它们联合分布的熵相同。)
+
在诸如<math>I(A;B|C)</math>的表达式中,<math>A</math> <math>B</math> 和 <math>C</math>不限于表示单个随机变量,它们同时可以表示在同一概率空间上定义的任意随机变量集合的联合分布。类似概率论中的表达方式,我们可以使用逗号来表示这种联合分布,例如<math>I(A_0,A_1;B_1,B_2,B_3|C_0,C_1)</math>。因此,使用分号(或有时用冒号或楔形<math>\wedge</math>)来分隔互信息符号的主要参数。(在联合熵的符号中,不需要作这样的区分,因为任意数量随机变量的'''<font color="#ff8000"> 联合熵Joint entropy</font>'''与它们联合分布的熵相同。)
      第236行: 第236行:       −
该结果已被用作证明信息理论中其他不等式的基础,尤其是香农不等式。对于某些正则条件下的连续随机变量,条件互信息也是非负的。
+
该结果已被用作证明信息理论中其他不等式的基础,尤其是香农不等式。对于某些正则条件下的连续随机变量,条件互信息也是非负的<ref>{{cite book |last1=Polyanskiy |first1=Yury |last2=Wu |first2=Yihong |title=Lecture notes on information theory |date=2017 |page=30 |url=http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf}}</ref>。
      第282行: 第282行:  
This definition is identical to that of [[interaction information]] except for a change in sign in the case of an odd number of random variables.  A complication is that this multivariate mutual information (as well as the interaction information) can be positive, negative, or zero, which makes this quantity difficult to interpret intuitively.  In fact, for <math>n</math> random variables, there are <math>2^n-1</math> degrees of freedom for how they might be correlated in an information-theoretic sense, corresponding to each non-empty subset of these variables. These degrees of freedom are bounded by various Shannon- and non-Shannon-type [[inequalities in information theory]].
 
This definition is identical to that of [[interaction information]] except for a change in sign in the case of an odd number of random variables.  A complication is that this multivariate mutual information (as well as the interaction information) can be positive, negative, or zero, which makes this quantity difficult to interpret intuitively.  In fact, for <math>n</math> random variables, there are <math>2^n-1</math> degrees of freedom for how they might be correlated in an information-theoretic sense, corresponding to each non-empty subset of these variables. These degrees of freedom are bounded by various Shannon- and non-Shannon-type [[inequalities in information theory]].
   −
该定义与'''<font color="#ff8000"> 交互信息Interaction information</font>'''的定义相同,只是在随机数为奇数的情况下符号发生了变化。一个复杂的问题是,该多元互信息(以及交互信息)可以是正,负或零,这使得其数量难以直观地解释。实际上,对于n个随机变量,存在2n-1个自由度。那么如何在信息理论上将它们关联,并对应于这些变量的每个非空子集,就是解决问题的关键。特别是这些自由度受到信息论中各种香农和非香农不等式的制约。
+
该定义与交互信息的定义相同,只是在随机数为奇数的情况下符号发生了变化。一个复杂的问题是,该多元互信息(以及交互信息)可以是正,负或零,这使得其数量难以直观地解释。实际上,对于n个随机变量,存在2n-1个自由度。那么如何在信息理论上将它们关联,并对应于这些变量的每个非空子集,就是解决问题的关键。特别是这些自由度受到信息论中各种香农和非香农不等式的制约。
     
25

个编辑