第1行: |
第1行: |
− | 此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。{{Short description|Measure of relative information in probability theory}} | + | 此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。 |
− | | |
| | | |
| + | {{Short description|Measure of relative information in probability theory}} |
| | | |
| {{Information theory}} | | {{Information theory}} |
− |
| |
− |
| |
− |
| |
− |
| |
| | | |
| | | |
第15行: |
第11行: |
| [[Venn diagram showing additive and subtractive relationships various information measures associated with correlated variables <math>X</math> and <math>Y</math>. The area contained by both circles is the joint entropy <math>\Eta(X,Y)</math>. The circle on the left (red and violet) is the individual entropy <math>\Eta(X)</math>, with the red being the conditional entropy <math>\Eta(X|Y)</math>. The circle on the right (blue and violet) is <math>\Eta(Y)</math>, with the blue being <math>\Eta(Y|X)</math>. The violet is the mutual information <math>\operatorname{I}(X;Y)</math>.]] | | [[Venn diagram showing additive and subtractive relationships various information measures associated with correlated variables <math>X</math> and <math>Y</math>. The area contained by both circles is the joint entropy <math>\Eta(X,Y)</math>. The circle on the left (red and violet) is the individual entropy <math>\Eta(X)</math>, with the red being the conditional entropy <math>\Eta(X|Y)</math>. The circle on the right (blue and violet) is <math>\Eta(Y)</math>, with the blue being <math>\Eta(Y|X)</math>. The violet is the mutual information <math>\operatorname{I}(X;Y)</math>.]] |
| | | |
− | 显示加减关系的文氏图各种信息测量与相关变量数学 x / 数学和 y / 数学相关。两个圆所包含的面积是联合熵 math Eta (x,y) / math。左边的圆圈(红色和紫色)是个体熵数学 Eta (x) / math,红色的是条件熵数学 Eta (x | y) / math。右边的圆(蓝色和紫色)是 math Eta (y) / math,蓝色的是 math Eta (y | x) / math。紫色是互信息 math operatorname { i }(x; y) / math. ]
| + | 文恩图显示了相加和相减的关系,各种信息测量与相关变量相关。两个圆圈所包含的区域是联合熵。左边的圆圈(红色和紫色)代表个体熵。左边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体。右边的圆圈(蓝色和紫色)是 < math > Eta (y) </math > ,蓝色的是 < math > Eta (y | x) </math > 。紫色是共同的信息[ math > 操作者名称{ i }(x; y) </math > ] |
| | | |
| | | |
| | | |
| + | In [[information theory]], the '''conditional entropy''' quantifies the amount of information needed to describe the outcome of a [[random variable]] <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in [[Shannon (unit)|shannon]]s, [[Nat (unit)|nat]]s, or [[Hartley (unit)|hartley]]s. The ''entropy of <math>Y</math> conditioned on <math>X</math>'' is written as <math>\Eta(Y|X)</math>. |
| | | |
| + | In information theory, the conditional entropy quantifies the amount of information needed to describe the outcome of a random variable <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in shannons, nats, or hartleys. The entropy of <math>Y</math> conditioned on <math>X</math> is written as <math>\Eta(Y|X)</math>. |
| | | |
− | In [[information theory]], the '''conditional entropy''' (or '''equivocation''') quantifies the amount of information needed to describe the outcome of a [[random variable]] <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in [[Shannon (unit)|shannon]]s, [[Nat (unit)|nat]]s, or [[Hartley (unit)|hartley]]s. The ''entropy of <math>Y</math> conditioned on <math>X</math>'' is written as <math>\Eta(Y|X)</math>.
| + | 在信息论中,如果另一个随机变量的值是已知的,那么条件熵就会量化描述一个随机变量的结果所需的信息量。在这里,信息是用夏农、纳特斯或哈特利来衡量的。“数学”的熵取决于“数学” ,“ x”表示“数学” ,“埃塔”表示“数学”。 |
| | | |
− | In information theory, the conditional entropy (or equivocation) quantifies the amount of information needed to describe the outcome of a random variable <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in shannons, nats, or hartleys. The entropy of <math>Y</math> conditioned on <math>X</math> is written as <math>\Eta(Y|X)</math>.
| |
| | | |
− | 在信息论中,假设另一个随机变量 math x / math 的值是已知的,信息条件熵量化描述随机变量 math y / math 的结果所需的信息量。在这里,信息是用夏农、纳特斯或哈特利来衡量的。数学 y / 数学的熵以数学 x / 数学为条件,表示为数学 Eta (y | x) / 数学。
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | == Definition ==
| |
| | | |
| == Definition == | | == Definition == |
− |
| |
− | 定义
| |
| | | |
| The conditional entropy of <math>Y</math> given <math>X</math> is defined as | | The conditional entropy of <math>Y</math> given <math>X</math> is defined as |
第41行: |
第29行: |
| The conditional entropy of <math>Y</math> given <math>X</math> is defined as | | The conditional entropy of <math>Y</math> given <math>X</math> is defined as |
| | | |
− | 数学 y / 数学给定数学 x / 数学的条件熵定义为
| + | 给定的 x 条件熵被定义为 |
− | | |
− | | |
| | | |
| | | |
第57行: |
第43行: |
| |indent = | | |indent = |
| | | |
− | 不会有事的
| + | 2012年10月22日 |
| | | |
| |title= | | |title= |
第63行: |
第49行: |
| |title= | | |title= |
| | | |
− | 标题
| + | 2012年10月11日 |
| | | |
| |equation = {{NumBlk||<math>\Eta(Y|X)\ = -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)}</math>|{{EquationRef|Eq.1}}}} | | |equation = {{NumBlk||<math>\Eta(Y|X)\ = -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)}</math>|{{EquationRef|Eq.1}}}} |
第69行: |
第55行: |
| |equation = }} | | |equation = }} |
| | | |
− | 会公式开始
| + | | equation = } |
| | | |
| |cellpadding= 6 | | |cellpadding= 6 |
第75行: |
第61行: |
| |cellpadding= 6 | | |cellpadding= 6 |
| | | |
− | 6号手术室
| + | 6 |
| | | |
| |border | | |border |
第93行: |
第79行: |
| |background colour=#F5FFFA}} | | |background colour=#F5FFFA}} |
| | | |
− | 5 / fffa } | + | 5/fffa }} |
− | | |
− | | |
| | | |
| | | |
第103行: |
第87行: |
| where <math>\mathcal X</math> and <math>\mathcal Y</math> denote the support sets of <math>X</math> and <math>Y</math>. | | where <math>\mathcal X</math> and <math>\mathcal Y</math> denote the support sets of <math>X</math> and <math>Y</math>. |
| | | |
− | 其中 math mathcal x / math 和 math mathcal y / math 表示数学 x / math 和数学 y / math 的支持集。
| + | 这里 < math > 数学 x </math > 和 < math > > 数学 y </math > 表示 < math > x </math > 和 < math > y </math > 的支持集。 |
− | | |
− | | |
| | | |
| | | |
第111行: |
第93行: |
| ''Note:'' It is conventioned that the expressions <math>0 \log 0</math> and <math>0 \log c/0</math> for fixed <math>c > 0</math> should be treated as being equal to zero. This is because <math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math> and <math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref> <!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? --> | | ''Note:'' It is conventioned that the expressions <math>0 \log 0</math> and <math>0 \log c/0</math> for fixed <math>c > 0</math> should be treated as being equal to zero. This is because <math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math> and <math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref> <!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? --> |
| | | |
− | Note: It is conventioned that the expressions <math>0 \log 0</math> and <math>0 \log c/0</math> for fixed <math>c > 0</math> should be treated as being equal to zero. This is because <math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math> and <math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math> <!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? --> | + | Note: It is conventioned that the expressions <math>0 \log 0</math> and <math>0 \log c/0</math> for fixed <math>c > 0</math> should be treated as being equal to zero. This is because <math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math> and <math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math> |
− | | |
− | 注意: 对于固定数学 c 0 / math,表达式 math 0 log 0 / math 和 math 0 log c / 0 / math 应当被视为等于零。这是因为 math theta 0 ^ + theta log theta 0 / math 和 math theta 0 ^ + theta log theta 0 / math! -- 因为 p (x,y)仍然可以等于0,即使 p (x) ! 0和 p (y) ! 0.P (x,y) p (x)0怎么样?-->
| |
− | | |
| | | |
| + | 注意: 常规的表达式 < math > 0 log 0 </math > 和 < math > 0 log c/0 </math > 对于 fixed < math > c > 0 </math > 应该被视为等于零。这是因为 < math > lim { theta to0 ^ + } theta,log,c/theta = 0 </math > 和 < math > lim { theta to0 ^ + } theta,log theta = 0 </math > |
| | | |
| | | |
第121行: |
第101行: |
| Intuitive explanation of the definition : | | Intuitive explanation of the definition : |
| | | |
− | Intuitive explanation of the definition :
| + | The chain rule follows from the above definition of conditional entropy: |
− | | |
− | 对定义的直观解释:
| |
− | | |
− | According to the definition, <math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math> where <math>\displaystyle f:( x,y) \ \rightarrow -\log_{2}( \ p( y|x) \ ) .</math> <math>\displaystyle f</math> associates to <math>\displaystyle ( x,y)</math> the information content of <math>\displaystyle ( Y=y)</math> given <math>\displaystyle (X=x)</math>, which is the amount of information needed to describe the event <math>\displaystyle (Y=y)</math> given <math>(X=x)</math>. According to the law of large numbers, <math>\displaystyle H(Y|X)</math> is the arithmetic mean of a large number of independent realizations of <math>\displaystyle f(X,Y)</math>.
| |
− | | |
− | According to the definition, <math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math> where <math>\displaystyle f:( x,y) \ \rightarrow -\log_{2}( \ p( y|x) \ ) .</math> <math>\displaystyle f</math> associates to <math>\displaystyle ( x,y)</math> the information content of <math>\displaystyle ( Y=y)</math> given <math>\displaystyle (X=x)</math>, which is the amount of information needed to describe the event <math>\displaystyle (Y=y)</math> given <math>(X=x)</math>. According to the law of large numbers, <math>\displaystyle H(Y|X)</math> is the arithmetic mean of a large number of independent realizations of <math>\displaystyle f(X,Y)</math>.
| |
− | | |
− | 根据定义,math displaystyle h (y | x) mathbb { e }( f (x,y)) / math displaystyle f: (x,y) righttarrow log {2}( p (y | x))。 / math displaystyle f / math 联想到 math displaystyle (x,y) / math 数学数学 displaystyle (y) / math 给定的 math displaystyle (x) / math,这是描述事件数学 displaystyle (y) / math 给定的 math (x) / math 所需的信息量。根据大数定律,math displaystyle h (y | x) / math 是 math displaystyle f (x,y) / math 的大量独立实现的算术平均数。
| |
− | | |
− | | |
| | | |
| + | 链式规则遵循了上述条件熵的定义: |
| | | |
| + | According to the definition, <math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math> where <math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) .</math> <math>\displaystyle f</math> associates to <math>\displaystyle ( x,y)</math> the information content of <math>\displaystyle ( Y=y)</math> given <math>\displaystyle (X=x)</math>, which is the amount of information needed to describe the event <math>\displaystyle (Y=y)</math> given <math>(X=x)</math>. According to the law of large numbers, <math>\displaystyle H(Y|X)</math> is the arithmetic mean of a large number of independent realizations of <math>\displaystyle f(X,Y)</math>. |
| | | |
| | | |
| | | |
| + | <math>\begin{align} |
| | | |
| + | 1.1.1.2.2.2.2.2.2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.3.3.3.3.3.3.3.3.3.3.3.4.3.3.3.3.3.3.3.3.3 |
| | | |
| == Motivation == | | == Motivation == |
| | | |
− | == Motivation == | + | \Eta(Y|X) &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \left(\frac{p(x)}{p(x,y)} \right) \\[4pt] |
| | | |
− | 动机
| + | Eta (y | x) & = sum _ { x in mathcal x,y in mathcal y } p (x,y) log left (frac { p (x)}{ p (x,y)} right)[4 pt ] |
| | | |
| Let <math>\Eta(Y|X=x)</math> be the [[Shannon Entropy|entropy]] of the discrete random variable <math>Y</math> conditioned on the discrete random variable <math>X</math> taking a certain value <math>x</math>. Denote the support sets of <math>X</math> and <math>Y</math> by <math>\mathcal X</math> and <math>\mathcal Y</math>. Let <math>Y</math> have [[probability mass function]] <math>p_Y{(y)}</math>. The unconditional entropy of <math>Y</math> is calculated as <math>\Eta(Y) := \mathbb{E}[\operatorname{I}(Y)]</math>, i.e. | | Let <math>\Eta(Y|X=x)</math> be the [[Shannon Entropy|entropy]] of the discrete random variable <math>Y</math> conditioned on the discrete random variable <math>X</math> taking a certain value <math>x</math>. Denote the support sets of <math>X</math> and <math>Y</math> by <math>\mathcal X</math> and <math>\mathcal Y</math>. Let <math>Y</math> have [[probability mass function]] <math>p_Y{(y)}</math>. The unconditional entropy of <math>Y</math> is calculated as <math>\Eta(Y) := \mathbb{E}[\operatorname{I}(Y)]</math>, i.e. |
| | | |
− | Let <math>\Eta(Y|X=x)</math> be the entropy of the discrete random variable <math>Y</math> conditioned on the discrete random variable <math>X</math> taking a certain value <math>x</math>. Denote the support sets of <math>X</math> and <math>Y</math> by <math>\mathcal X</math> and <math>\mathcal Y</math>. Let <math>Y</math> have probability mass function <math>p_Y{(y)}</math>. The unconditional entropy of <math>Y</math> is calculated as <math>\Eta(Y) := \mathbb{E}[\operatorname{I}(Y)]</math>, i.e.
| + | &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)(\log (p(x))-\log (p(x,y))) \\[4pt] |
| | | |
− | 设数学是离散型随机变量数学 y / math 的熵,条件是离散型随机变量数学 x / math 取一定值数学 x / math。用 math mathcal x / math 和 math mathcal y / math 表示数学 x / math 和数学 y / math 的支持集。让数学 y / 数学有概率质量函数 / 数学 p {(y)} / 数学。数学 y / math 的无条件熵计算为 math Eta (y) : mathbb { e }[ operatorname { i }(y)] / math,即。
| + | & = sum _ { x in mathcal x,y in mathcal y } p (x,y)(log (p (x))-log (p (x,y)))[4 pt ] |
| | | |
| | | |
| | | |
| + | &= -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log (p(x,y)) + \sum_{x\in\mathcal X, y\in\mathcal Y}{p(x,y)\log(p(x))} \\[4pt] |
| | | |
| + | & =-sum _ { x in mathcal x,y in mathcal y } p (x,y) log (p (x,y)) + sum _ { x in mathcal x,y in mathcal y }{ p (x,y) log (p (x))}[4 pt ] |
| | | |
| :<math>\Eta(Y) = \sum_{y\in\mathcal Y} {\mathrm{Pr}(Y=y)\,\mathrm{I}(y)} | | :<math>\Eta(Y) = \sum_{y\in\mathcal Y} {\mathrm{Pr}(Y=y)\,\mathrm{I}(y)} |
| | | |
− | <math>\Eta(Y) = \sum_{y\in\mathcal Y} {\mathrm{Pr}(Y=y)\,\mathrm{I}(y)}
| + | & = \Eta(X,Y) + \sum_{x \in \mathcal X} p(x)\log (p(x) ) \\[4pt] |
| | | |
− | 数学中的 Eta (y)(y)(y)(y)(y)(y)(y)
| + | & = Eta (x,y) + sum _ { x in mathcal x } p (x) log (p (x))[4 pt ] |
| | | |
| = -\sum_{y\in\mathcal Y} {p_Y(y) \log_2{p_Y(y)}},</math> | | = -\sum_{y\in\mathcal Y} {p_Y(y) \log_2{p_Y(y)}},</math> |
| | | |
− | = -\sum_{y\in\mathcal Y} {p_Y(y) \log_2{p_Y(y)}},</math> | + | & = \Eta(X,Y) - \Eta(X). |
| | | |
− | - 和数学 y }{ py (y) log 2{ py (y)} ,/ math
| + | & = Eta (x,y)-Eta (x). |
| | | |
| | | |
| | | |
| + | \end{align}</math> |
| | | |
| + | 结束{ align } </math > |
| | | |
| where <math>\operatorname{I}(y_i)</math> is the [[information content]] of the [[Outcome (probability)|outcome]] of <math>Y</math> taking the value <math>y_i</math>. The entropy of <math>Y</math> conditioned on <math>X</math> taking the value <math>x</math> is defined analogously by [[conditional expectation]]: | | where <math>\operatorname{I}(y_i)</math> is the [[information content]] of the [[Outcome (probability)|outcome]] of <math>Y</math> taking the value <math>y_i</math>. The entropy of <math>Y</math> conditioned on <math>X</math> taking the value <math>x</math> is defined analogously by [[conditional expectation]]: |
− |
| |
− | where <math>\operatorname{I}(y_i)</math> is the information content of the outcome of <math>Y</math> taking the value <math>y_i</math>. The entropy of <math>Y</math> conditioned on <math>X</math> taking the value <math>x</math> is defined analogously by conditional expectation:
| |
− |
| |
− | 其中 math operatorname { i }(yi) / math 是取值 math y / math 的数学 y / math 结果的信息内容。数学 y / 数学的熵取决于数学 x / 数学的取值,数学 x / 数学的定义类似于条件期望的定义:
| |
| | | |
| | | |
| | | |
| + | In general, a chain rule for multiple random variables holds: |
| | | |
| + | 一般来说,多个随机变量的链式规则适用于: |
| | | |
| :<math>\Eta(Y|X=x) | | :<math>\Eta(Y|X=x) |
| | | |
− | <math>\Eta(Y|X=x)
| + | = -\sum_{y\in\mathcal Y} {\Pr(Y = y|X=x) \log_2{\Pr(Y = y|X=x)}}.</math> |
| | | |
− | (y | x) | + | <math> \Eta(X_1,X_2,\ldots,X_n) = |
| | | |
− | = -\sum_{y\in\mathcal Y} {\Pr(Y = y|X=x) \log_2{\Pr(Y = y|X=x)}}. </math>
| + | < math > Eta (x1,x2,ldots,xn) = |
| | | |
− | = -\sum_{y\in\mathcal Y} {\Pr(Y = y|X=x) \log_2{\Pr(Y = y|X=x)}}. </math>
| + | Note that <math>\Eta(Y|X)</math> is the result of averaging <math>\Eta(Y|X=x)</math> over all possible values <math>x</math> that <math>X</math> may take. Also, if the above sum is taken over a sample <math>y_1, \dots, y_n</math>, the expected value <math>E_X[ \Eta(y_1, \dots, y_n \mid X = x)]</math> is known in some domains as '''equivocation'''.<ref>{{cite journal|author1=Hellman, M.|author2=Raviv, J.|year=1970|title=Probability of error, equivocation, and the Chernoff bound|journal=IEEE Transactions on Information Theory|volume=16|issue=4|pp=368-372}}</ref> |
− | | |
− | - 和数学 y }{ Pr (y | x) log 2{ Pr (y | x)}。数学
| |
− | | |
− | | |
− | | |
− | | |
− | | |
− | <math>\Eta(Y|X)</math> is the result of averaging <math>\Eta(Y|X=x)</math> over all possible values <math>x</math> that <math>X</math> may take. | |
− | | |
− | <math>\Eta(Y|X)</math> is the result of averaging <math>\Eta(Y|X=x)</math> over all possible values <math>x</math> that <math>X</math> may take. | |
− | | |
− | 数学是对所有可能的数值求平均值的结果,数学 x / 数学可能需要。
| |
| | | |
| + | \sum_{i=1}^n \Eta(X_i | X_1, \ldots, X_{i-1}) </math> |
| | | |
| + | Sum { i = 1} ^ n Eta (x _ i | x _ 1,ldots,x _ { i-1}) </math > |
| | | |
| | | |
第209行: |
第175行: |
| Given [[Discrete random variable|discrete random variables]] <math>X</math> with image <math>\mathcal X</math> and <math>Y</math> with image <math>\mathcal Y</math>, the conditional entropy of <math>Y</math> given <math>X</math> is defined as the weighted sum of <math>\Eta(Y|X=x)</math> for each possible value of <math>x</math>, using <math>p(x)</math> as the weights:<ref name=cover1991>{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|15}} | | Given [[Discrete random variable|discrete random variables]] <math>X</math> with image <math>\mathcal X</math> and <math>Y</math> with image <math>\mathcal Y</math>, the conditional entropy of <math>Y</math> given <math>X</math> is defined as the weighted sum of <math>\Eta(Y|X=x)</math> for each possible value of <math>x</math>, using <math>p(x)</math> as the weights:<ref name=cover1991>{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|15}} |
| | | |
− | Given discrete random variables <math>X</math> with image <math>\mathcal X</math> and <math>Y</math> with image <math>\mathcal Y</math>, the conditional entropy of <math>Y</math> given <math>X</math> is defined as the weighted sum of <math>\Eta(Y|X=x)</math> for each possible value of <math>x</math>, using <math>p(x)</math> as the weights:
| + | It has a similar form to chain rule in probability theory, except that addition instead of multiplication is used. |
− | | |
− | 给定离散随机变量数学 x / 数学 x / 数学 x / 数学 y / 数学 y / 数学,数学 y / 数学 x / 数学的条件熵定义为数学 x / 数学每个可能值的加权和,用数学 p (x) / 数学作为权重:
| |
− | | |
| | | |
| + | 除了用加法代替乘法之外,它的形式与概率论的链式法则相似。 |
| | | |
| | | |
| | | |
| :<math> | | :<math> |
− |
| |
− | <math>
| |
− |
| |
− | 数学
| |
| | | |
| \begin{align} | | \begin{align} |
| | | |
− | \begin{align}
| + | Bayes' rule for conditional entropy states |
| | | |
− | Begin { align }
| + | 条件熵的贝叶斯规则 |
| | | |
| \Eta(Y|X)\ &\equiv \sum_{x\in\mathcal X}\,p(x)\,\Eta(Y|X=x)\\ | | \Eta(Y|X)\ &\equiv \sum_{x\in\mathcal X}\,p(x)\,\Eta(Y|X=x)\\ |
| | | |
− | \Eta(Y|X)\ &\equiv \sum_{x\in\mathcal X}\,p(x)\,\Eta(Y|X=x)\\ | + | <math>\Eta(Y|X) \,=\, \Eta(X|Y) - \Eta(X) + \Eta(Y).</math> |
− | | |
− | 数学 x 中的 Eta (y | x) ,p (x) ,Eta (y | x)
| |
| | | |
− | & =-\sum_{x\in\mathcal X} p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\, p(y|x)\\
| + | [数学] Eta (y | x) ,= ,Eta (x | y)-Eta (x) + Eta (y) |
| | | |
| & =-\sum_{x\in\mathcal X} p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\, p(y|x)\\ | | & =-\sum_{x\in\mathcal X} p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\, p(y|x)\\ |
− |
| |
− | 数学 x } p (x) sum y } ,p (y | x) ,log,p (y | x)
| |
| | | |
| & =-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(x,y)\,\log\,p(y|x)\\ | | & =-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(x,y)\,\log\,p(y|x)\\ |
| | | |
− | & =-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(x,y)\,\log\,p(y|x)\\
| + | Proof. <math>\Eta(Y|X) = \Eta(X,Y) - \Eta(X)</math> and <math>\Eta(X|Y) = \Eta(Y,X) - \Eta(Y)</math>. Symmetry entails <math>\Eta(X,Y) = \Eta(Y,X)</math>. Subtracting the two equations implies Bayes' rule. |
− | | |
− | 数学中的 x 和 y,p (x,y) ,log,p (y | x)
| |
| | | |
− | & =-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(y|x)\\
| + | 证据。Eta (y | x) = Eta (x,y)-Eta (x) | math > Eta (x | y) = Eta (y,x)-Eta (y).对称意味着 Eta (x,y) = Eta (y,x)。减去这两个方程就得到了贝叶斯定律。 |
| | | |
| & =-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(y|x)\\ | | & =-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(y|x)\\ |
− |
| |
− | 数学 x,y = p (x,y) log,p (y | x)
| |
| | | |
| & =-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)}. \\ | | & =-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)}. \\ |
| | | |
− | & =-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)}. \\
| + | If <math>Y</math> is conditionally independent of <math>Z</math> given <math>X</math> we have: |
| | | |
− | (x,y) log-frac { p (x,y)}{ p (x,y)}.\\
| + | 如果[数学] y </math > 是条件独立于[数学] z </math > 给定 < 数学 > x </math > 我们有: |
− | | |
− | & = \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}. \\
| |
| | | |
| & = \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}. \\ | | & = \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}. \\ |
− |
| |
− | (x,y) log frac { p (x)}{ p (x,y)}.\\
| |
| | | |
| \end{align} | | \end{align} |
| | | |
− | \end{align} | + | <math>\Eta(Y|X,Z) \,=\, \Eta(Y|X).</math> |
| | | |
− | End { align }
| + | [ math ] Eta (y | x,z) ,= ,Eta (y | x) |
| | | |
| </math> | | </math> |
− |
| |
− | </math>
| |
− |
| |
− | 数学
| |
− |
| |
− |
| |
| | | |
| | | |
第283行: |
第225行: |
| <!-- This paragraph is incorrect; the last line is not the KL divergence between any two distributions, since p(x) is [in general] not a valid distribution over the domains of X and Y. The last formula above is the [[Kullback-Leibler divergence]], also known as relative entropy. Relative entropy is always positive, and vanishes if and only if <math>p(x,y) = p(x)</math>. This is when knowing <math>x</math> tells us everything about <math>y</math>. ADDED: Could this comment be out of date since the KL divergence is not mentioned above? November 2014 --> | | <!-- This paragraph is incorrect; the last line is not the KL divergence between any two distributions, since p(x) is [in general] not a valid distribution over the domains of X and Y. The last formula above is the [[Kullback-Leibler divergence]], also known as relative entropy. Relative entropy is always positive, and vanishes if and only if <math>p(x,y) = p(x)</math>. This is when knowing <math>x</math> tells us everything about <math>y</math>. ADDED: Could this comment be out of date since the KL divergence is not mentioned above? November 2014 --> |
| | | |
− | <!-- This paragraph is incorrect; the last line is not the KL divergence between any two distributions, since p(x) is [in general] not a valid distribution over the domains of X and Y. The last formula above is the Kullback-Leibler divergence, also known as relative entropy. Relative entropy is always positive, and vanishes if and only if <math>p(x,y) = p(x)</math>. This is when knowing <math>x</math> tells us everything about <math>y</math>. ADDED: Could this comment be out of date since the KL divergence is not mentioned above? November 2014 -->
| + | For any <math>X</math> and <math>Y</math>: |
| | | |
− | <! -- 本段不正确; 最后一行不是任何两个分布之间的 KL 散度,因为 p (x)[一般]不是 x 和 y 域上的有效分布。上面的最后一个公式是 Kullback-Leibler 的背离,也被称为相对熵。相对熵总是正的,只有当且仅当数学 p (x,y) p (x) / math 时才消失。这是当我们知道数学 x / 数学告诉我们关于数学 y / 数学的一切。补充: 这个评论是否过时了,因为 KL 的分歧没有在上面提到?2014年11月-- | + | 对于任意的 < math > x </math > 和 < math > > y </math > : |
| | | |
| | | |
| | | |
| + | <math display="block">\begin{align} |
| | | |
| + | (数学显示 = “ block” > begin { align }) |
| | | |
| ==Properties== | | ==Properties== |
| | | |
− | ==Properties==
| + | \Eta(Y|X) &\le \Eta(Y) \, \\ |
| | | |
− | 属性
| + | 埃塔(y | x)及埃塔(y) , |
| | | |
| ===Conditional entropy equals zero=== | | ===Conditional entropy equals zero=== |
| | | |
− | ===Conditional entropy equals zero=== | + | \Eta(X,Y) &= \Eta(X|Y) + \Eta(Y|X) + \operatorname{I}(X;Y),\qquad \\ |
| | | |
− | 条件熵等于零
| + | eta (x,y) & = Eta (x | y) + Eta (y | x) + 操作数名{ i }(x; y) ,qquad |
| | | |
| <math>\Eta(Y|X)=0</math> if and only if the value of <math>Y</math> is completely determined by the value of <math>X</math>. | | <math>\Eta(Y|X)=0</math> if and only if the value of <math>Y</math> is completely determined by the value of <math>X</math>. |
| | | |
− | <math>\Eta(Y|X)=0</math> if and only if the value of <math>Y</math> is completely determined by the value of <math>X</math>.
| + | \Eta(X,Y) &= \Eta(X) + \Eta(Y) - \operatorname{I}(X;Y),\, \\ |
| | | |
− | Math Eta (y | x)0 / math 当且仅当 math y / math 的值完全由 math x / math 的值决定。
| + | Eta (x,y) & = Eta (x) + Eta (y)-操作员名称{ i }(x; y) ,, |
| | | |
| | | |
| | | |
| + | \operatorname{I}(X;Y) &\le \Eta(X),\, |
| | | |
| + | 操作者名{ i }(x; y) & le Eta (x) ,, |
| | | |
| ===Conditional entropy of independent random variables=== | | ===Conditional entropy of independent random variables=== |
| | | |
− | ===Conditional entropy of independent random variables===
| + | \end{align}</math> |
| | | |
− | 独立随机变量的条件熵
| + | 结束{ align } </math > |
| | | |
| Conversely, <math>\Eta(Y|X) = \Eta(Y)</math> if and only if <math>Y</math> and <math>X</math> are [[independent random variables]]. | | Conversely, <math>\Eta(Y|X) = \Eta(Y)</math> if and only if <math>Y</math> and <math>X</math> are [[independent random variables]]. |
− |
| |
− | Conversely, <math>\Eta(Y|X) = \Eta(Y)</math> if and only if <math>Y</math> and <math>X</math> are independent random variables.
| |
− |
| |
− | 相反,math Eta (y | x) Eta (y) / math 当且仅当 math y / math 和 math x / math 是独立随机变量。
| |
| | | |
| | | |
| | | |
| + | where <math>\operatorname{I}(X;Y)</math> is the mutual information between <math>X</math> and <math>Y</math>. |
| | | |
| + | 其中,“数学”和“数学”之间的相互信息。 |
| | | |
| ===Chain rule=== | | ===Chain rule=== |
− |
| |
− | ===Chain rule===
| |
− |
| |
− | 链式规则
| |
| | | |
| Assume that the combined system determined by two random variables <math>X</math> and <math>Y</math> has [[joint entropy]] <math>\Eta(X,Y)</math>, that is, we need <math>\Eta(X,Y)</math> bits of information on average to describe its exact state. Now if we first learn the value of <math>X</math>, we have gained <math>\Eta(X)</math> bits of information. Once <math>X</math> is known, we only need <math>\Eta(X,Y)-\Eta(X)</math> bits to describe the state of the whole system. This quantity is exactly <math>\Eta(Y|X)</math>, which gives the ''chain rule'' of conditional entropy: | | Assume that the combined system determined by two random variables <math>X</math> and <math>Y</math> has [[joint entropy]] <math>\Eta(X,Y)</math>, that is, we need <math>\Eta(X,Y)</math> bits of information on average to describe its exact state. Now if we first learn the value of <math>X</math>, we have gained <math>\Eta(X)</math> bits of information. Once <math>X</math> is known, we only need <math>\Eta(X,Y)-\Eta(X)</math> bits to describe the state of the whole system. This quantity is exactly <math>\Eta(Y|X)</math>, which gives the ''chain rule'' of conditional entropy: |
| | | |
− | Assume that the combined system determined by two random variables <math>X</math> and <math>Y</math> has joint entropy <math>\Eta(X,Y)</math>, that is, we need <math>\Eta(X,Y)</math> bits of information on average to describe its exact state. Now if we first learn the value of <math>X</math>, we have gained <math>\Eta(X)</math> bits of information. Once <math>X</math> is known, we only need <math>\Eta(X,Y)-\Eta(X)</math> bits to describe the state of the whole system. This quantity is exactly <math>\Eta(Y|X)</math>, which gives the chain rule of conditional entropy:
| + | For independent <math>X</math> and <math>Y</math>: |
− | | |
− | 假设由两个随机变量数学 x / math 和数学 y / math 组成的组合系统具有联合熵数学 Eta (x,y) / math,也就是说,我们平均需要 math Eta (x,y) / math 位信息来描述它的精确状态。现在,如果我们首先学习数学 x / math 的值,我们就得到了数学 Eta (x) / 数学信息位。一旦知道了数学 x / math,我们只需要 math Eta (x,y)- Eta (x) / math 位来描述整个系统的状态。这个量正是 math Eta (y | x) / math,它给出了条件熵的链式法则:
| |
− | | |
| | | |
| + | 对于独立的《数学》和《数学》 : |
| | | |
| | | |
第347行: |
第285行: |
| :<math>\Eta(Y|X)\, = \, \Eta(X,Y)- \Eta(X).</math><ref name=cover1991 />{{rp|17}} | | :<math>\Eta(Y|X)\, = \, \Eta(X,Y)- \Eta(X).</math><ref name=cover1991 />{{rp|17}} |
| | | |
− | <math>\Eta(Y|X)\, = \, \Eta(X,Y)- \Eta(X).</math> | + | <math>\Eta(Y|X) = \Eta(Y) </math> and <math>\Eta(X|Y) = \Eta(X) \, </math> |
− | | |
− | Math Eta (y | x) , Eta (x,y)- Eta (x) . / math
| |
− | | |
| | | |
| + | Eta (y | x) = Eta (y) </math > and < math > Eta (x | y) = Eta (x) ,</math > |
| | | |
| | | |
第357行: |
第293行: |
| The chain rule follows from the above definition of conditional entropy: | | The chain rule follows from the above definition of conditional entropy: |
| | | |
− | The chain rule follows from the above definition of conditional entropy:
| + | Although the specific-conditional entropy <math>\Eta(X|Y=y)</math> can be either less or greater than <math>\Eta(X)</math> for a given random variate <math>y</math> of <math>Y</math>, <math>\Eta(X|Y)</math> can never exceed <math>\Eta(X)</math>. |
− | | |
− | 链式规则遵循以上条件熵的定义:
| |
− | | |
| | | |
| + | 虽然对于给定的随机变量来说,特定条件熵的 Eta (x | y = y) </math > </math > 可能比 </math > Eta (x) </math > </math > ,< math > Eta (x | y) </math > 不能超过 math > Eta (x) </math > 。 |
| | | |
| | | |
| | | |
| :<math>\begin{align} | | :<math>\begin{align} |
− |
| |
− | <math>\begin{align}
| |
− |
| |
− | 数学 begin { align }
| |
| | | |
| \Eta(Y|X) &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \left(\frac{p(x)}{p(x,y)} \right) \\[4pt] | | \Eta(Y|X) &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \left(\frac{p(x)}{p(x,y)} \right) \\[4pt] |
| | | |
− | \Eta(Y|X) &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \left(\frac{p(x)}{p(x,y)} \right) \\[4pt]
| + | &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)(\log (p(x))-\log (p(x,y))) \\[4pt] |
| | | |
− | Eta (y | x) & sum (x,y) p (x,y) log 左(frac (x)} p (x,y)右)[4 pt ]
| + | The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called conditional differential (or continuous) entropy. Let <math>X</math> and <math>Y</math> be a continuous random variables with a joint probability density function <math>f(x,y)</math>. The differential conditional entropy <math>h(X|Y)</math> is defined as |
| | | |
− | &= -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log (p(x,y)) + \sum_{x\in\mathcal X, y\in\mathcal Y}{p(x,y)\log(p(x))} \\[4pt]
| + | 上面的定义适用于离散随机变量。离散条件熵的连续形式称为条件微分(或连续)熵。设 x 是连续随机变量,f (x,y)是连续随机概率密度函数。微分条件熵 < math > h (x | y) </math > 被定义为 |
| | | |
| &= -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log (p(x,y)) + \sum_{x\in\mathcal X, y\in\mathcal Y}{p(x,y)\log(p(x))} \\[4pt] | | &= -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log (p(x,y)) + \sum_{x\in\mathcal X, y\in\mathcal Y}{p(x,y)\log(p(x))} \\[4pt] |
− |
| |
− | 数学 x,y 中数学 y } p (x,y) log (p (x,y)) + 数学 x,y 中数学 y } p (x,y) log (p (x))[4 pt ]
| |
| | | |
| & = \Eta(X,Y) + \sum_{x \in \mathcal X} p(x)\log (p(x) ) \\[4pt] | | & = \Eta(X,Y) + \sum_{x \in \mathcal X} p(x)\log (p(x) ) \\[4pt] |
| | | |
− | & = \Eta(X,Y) + \sum_{x \in \mathcal X} p(x)\log (p(x) ) \\[4pt]
| + | {{Equation box 1 |
| | | |
− | & Eta (x,y) + sum { x } p (x) log (p (x))[4 pt ]
| + | {方程式方框1 |
| | | |
| & = \Eta(X,Y) - \Eta(X). | | & = \Eta(X,Y) - \Eta(X). |
| | | |
− | & = \Eta(X,Y) - \Eta(X).
| + | |indent = |
| | | |
− | & Eta (x,y)- Eta (x).
| + | 2012年10月22日 |
| | | |
| \end{align}</math> | | \end{align}</math> |
| | | |
− | \end{align}</math>
| + | |title= |
| | | |
− | End { align } / math
| + | 2012年10月11日 |
| | | |
| | | |
| | | |
| + | |equation = }} |
| | | |
| + | | equation = } |
| | | |
| In general, a chain rule for multiple random variables holds: | | In general, a chain rule for multiple random variables holds: |
| | | |
− | In general, a chain rule for multiple random variables holds:
| + | |cellpadding= 6 |
| | | |
− | 一般来说,多个随机变量的链式规则适用于:
| + | 6 |
| | | |
| | | |
| | | |
| + | |border |
| | | |
| + | 边界 |
| | | |
| :<math> \Eta(X_1,X_2,\ldots,X_n) = | | :<math> \Eta(X_1,X_2,\ldots,X_n) = |
| | | |
− | <math> \Eta(X_1,X_2,\ldots,X_n) =
| + | |border colour = #0073CF |
| | | |
− | Math Eta (x1,x2, ldots,xn)
| + | 0073CF |
| | | |
| \sum_{i=1}^n \Eta(X_i | X_1, \ldots, X_{i-1}) </math><ref name=cover1991 />{{rp|22}} | | \sum_{i=1}^n \Eta(X_i | X_1, \ldots, X_{i-1}) </math><ref name=cover1991 />{{rp|22}} |
| | | |
− | \sum_{i=1}^n \Eta(X_i | X_1, \ldots, X_{i-1}) </math>
| + | |background colour=#F5FFFA}} |
− | | |
− | { i } ^ n Eta (xi | x1,ldots,x { i-1}) / math
| |
− | | |
| | | |
| + | 5/fffa }} |
| | | |
| | | |
第433行: |
第363行: |
| It has a similar form to [[Chain rule (probability)|chain rule]] in probability theory, except that addition instead of multiplication is used. | | It has a similar form to [[Chain rule (probability)|chain rule]] in probability theory, except that addition instead of multiplication is used. |
| | | |
− | It has a similar form to chain rule in probability theory, except that addition instead of multiplication is used.
| |
| | | |
− | 它有一个类似的形式链规则在概率论,除了加法代替乘法是使用。
| |
| | | |
| + | In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative. |
| | | |
− | | + | 与离散随机变量的条件熵相反,条件微分熵可能是负的。 |
− | | |
− | | |
− | ===Bayes' rule===
| |
| | | |
| ===Bayes' rule=== | | ===Bayes' rule=== |
− |
| |
− | 贝叶斯规则
| |
| | | |
| [[Bayes' rule]] for conditional entropy states | | [[Bayes' rule]] for conditional entropy states |
| | | |
− | Bayes' rule for conditional entropy states
| + | As in the discrete case there is a chain rule for differential entropy: |
| | | |
− | 条件熵的贝叶斯规则
| + | 在离散情况下,微分熵有一个链式规则: |
| | | |
| :<math>\Eta(Y|X) \,=\, \Eta(X|Y) - \Eta(X) + \Eta(Y).</math> | | :<math>\Eta(Y|X) \,=\, \Eta(X|Y) - \Eta(X) + \Eta(Y).</math> |
| | | |
− | <math>\Eta(Y|X) \,=\, \Eta(X|Y) - \Eta(X) + \Eta(Y).</math> | + | <math>h(Y|X)\,=\,h(X,Y)-h(X)</math> |
| | | |
− | Math Eta (y | x) , Eta (x | y)- Eta (x) + Eta (y) . / math
| + | H (y | x) ,= ,h (x,y)-h (x) |
| | | |
| | | |
| | | |
| + | Notice however that this rule may not be true if the involved differential entropies do not exist or are infinite. |
| | | |
| + | 然而,请注意,如果所涉及的微分熵不存在或者是无限的,那么这个规则可能不成立。 |
| | | |
| ''Proof.'' <math>\Eta(Y|X) = \Eta(X,Y) - \Eta(X)</math> and <math>\Eta(X|Y) = \Eta(Y,X) - \Eta(Y)</math>. Symmetry entails <math>\Eta(X,Y) = \Eta(Y,X)</math>. Subtracting the two equations implies Bayes' rule. | | ''Proof.'' <math>\Eta(Y|X) = \Eta(X,Y) - \Eta(X)</math> and <math>\Eta(X|Y) = \Eta(Y,X) - \Eta(Y)</math>. Symmetry entails <math>\Eta(X,Y) = \Eta(Y,X)</math>. Subtracting the two equations implies Bayes' rule. |
− |
| |
− | Proof. <math>\Eta(Y|X) = \Eta(X,Y) - \Eta(X)</math> and <math>\Eta(X|Y) = \Eta(Y,X) - \Eta(Y)</math>. Symmetry entails <math>\Eta(X,Y) = \Eta(Y,X)</math>. Subtracting the two equations implies Bayes' rule.
| |
− |
| |
− | 证据。Math Eta (y | x) Eta (x,y)- Eta (x) / math Eta (x | y) Eta (y,x)- Eta (y) / math.对称性需要数学 Eta (x,y) Eta (y,x) / 数学。减去这两个方程就得到了贝叶斯定律。
| |
| | | |
| | | |
| | | |
| + | Joint differential entropy is also used in the definition of the mutual information between continuous random variables: |
| | | |
| + | 联合微分熵也用于连续随机变量之间互信息的定义: |
| | | |
| If <math>Y</math> is [[Conditional independence|conditionally independent]] of <math>Z</math> given <math>X</math> we have: | | If <math>Y</math> is [[Conditional independence|conditionally independent]] of <math>Z</math> given <math>X</math> we have: |
| | | |
− | If <math>Y</math> is conditionally independent of <math>Z</math> given <math>X</math> we have:
| + | <math>\operatorname{I}(X,Y)=h(X)-h(X|Y)=h(Y)-h(Y|X)</math> |
− | | |
− | 如果数学 y / 数学是条件独立于数学 z / 数学给定的数学 x / 数学,我们有:
| |
− | | |
| | | |
| + | (x,y) = h (x)-h (x | y) = h (y)-h (y | x) </math > |
| | | |
| | | |
第485行: |
第407行: |
| :<math>\Eta(Y|X,Z) \,=\, \Eta(Y|X).</math> | | :<math>\Eta(Y|X,Z) \,=\, \Eta(Y|X).</math> |
| | | |
− | <math>\Eta(Y|X,Z) \,=\, \Eta(Y|X).</math> | + | <math>h(X|Y) \le h(X)</math> with equality if and only if <math>X</math> and <math>Y</math> are independent. |
| | | |
− | Math Eta (y | x,z) , Eta (y | x) . / math
| + | 当且仅当 < math > x </math > 和 < math > y </math > 是独立的。 |
| | | |
| | | |
− |
| |
− |
| |
− |
| |
− | ===Other properties===
| |
| | | |
| ===Other properties=== | | ===Other properties=== |
− |
| |
− | 其他物业
| |
| | | |
| For any <math>X</math> and <math>Y</math>: | | For any <math>X</math> and <math>Y</math>: |
| | | |
− | For any <math>X</math> and <math>Y</math>: | + | The conditional differential entropy yields a lower bound on the expected squared error of an estimator. For any random variable <math>X</math>, observation <math>Y</math> and estimator <math>\widehat{X}</math> the following holds: |
| | | |
− | 对于任何数学 x / 数学 y / 数学:
| + | 条件微分熵对估计量的期望平方误差产生一个下限。对于任何一个随机变量,观察值 < math > y </math > 和估计值 < math > widedhat { x } </math > ,下面是: |
| | | |
| :<math display="block">\begin{align} | | :<math display="block">\begin{align} |
| | | |
− | <math display="block">\begin{align} | + | <math display="block">\mathbb{E}\left[\bigl(X - \widehat{X}{(Y)}\bigr)^2\right] |
| | | |
− | 数学显示“ block” begin { align }
| + | < math display = " block" > mathbb { e } left [ bigl (x-widehat { x }{(y)} bigr) ^ 2 right ] |
| | | |
| \Eta(Y|X) &\le \Eta(Y) \, \\ | | \Eta(Y|X) &\le \Eta(Y) \, \\ |
| | | |
− | \Eta(Y|X) &\le \Eta(Y) \, \\
| + | \ge \frac{1}{2\pi e}e^{2h(X|Y)}</math> |
| | | |
− | 三、 Eta (y | x)和 le Eta (y) ,
| + | 1}{2 pi e } e ^ {2 h (x | y)} </math > |
| | | |
| \Eta(X,Y) &= \Eta(X|Y) + \Eta(Y|X) + \operatorname{I}(X;Y),\qquad \\ | | \Eta(X,Y) &= \Eta(X|Y) + \Eta(Y|X) + \operatorname{I}(X;Y),\qquad \\ |
− |
| |
− | \Eta(X,Y) &= \Eta(X|Y) + \Eta(Y|X) + \operatorname{I}(X;Y),\qquad \\
| |
− |
| |
− | (x,y) & Eta (x | y) + Eta (y | x) + operatorname { i }(x; y) ,
| |
| | | |
| \Eta(X,Y) &= \Eta(X) + \Eta(Y) - \operatorname{I}(X;Y),\, \\ | | \Eta(X,Y) &= \Eta(X) + \Eta(Y) - \operatorname{I}(X;Y),\, \\ |
| | | |
− | \Eta(X,Y) &= \Eta(X) + \Eta(Y) - \operatorname{I}(X;Y),\, \\
| + | This is related to the uncertainty principle from quantum mechanics. |
| | | |
− | Eta (x,y) & Eta (x) + Eta (y)-操作者名称{ i }(x; y) , ,
| + | 这与量子力学的不确定性原理有关。 |
− | | |
− | \operatorname{I}(X;Y) &\le \Eta(X),\,
| |
| | | |
| \operatorname{I}(X;Y) &\le \Eta(X),\, | | \operatorname{I}(X;Y) &\le \Eta(X),\, |
− |
| |
− | { i }(x; y) & le Eta (x) , ,
| |
− |
| |
− | \end{align}</math>
| |
| | | |
| \end{align}</math> | | \end{align}</math> |
− |
| |
− | End { align } / math
| |
| | | |
| | | |
| | | |
| + | In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy. The latter can take negative values, unlike its classical counterpart. |
| | | |
| + | 在量子信息论中,条件熵被推广为条件量子熵。后者可以采取负值,不像它的古典对应物。 |
| | | |
| where <math>\operatorname{I}(X;Y)</math> is the [[mutual information]] between <math>X</math> and <math>Y</math>. | | where <math>\operatorname{I}(X;Y)</math> is the [[mutual information]] between <math>X</math> and <math>Y</math>. |
− |
| |
− | where <math>\operatorname{I}(X;Y)</math> is the mutual information between <math>X</math> and <math>Y</math>.
| |
− |
| |
− | 其中 math operatorname { i }(x; y) / math 是 math x / math 和 math y / math 之间的相互信息。
| |
− |
| |
− |
| |
− |
| |
| | | |
| | | |
− | For independent <math>X</math> and <math>Y</math>:
| |
| | | |
| For independent <math>X</math> and <math>Y</math>: | | For independent <math>X</math> and <math>Y</math>: |
− |
| |
− | 对于独立数学 x / 数学 y / 数学:
| |
− |
| |
− |
| |
| | | |
| | | |
| | | |
| :<math>\Eta(Y|X) = \Eta(Y) </math> and <math>\Eta(X|Y) = \Eta(X) \, </math> | | :<math>\Eta(Y|X) = \Eta(Y) </math> and <math>\Eta(X|Y) = \Eta(X) \, </math> |
− |
| |
− | <math>\Eta(Y|X) = \Eta(Y) </math> and <math>\Eta(X|Y) = \Eta(X) \, </math>
| |
− |
| |
− | Math Eta (y | x) Eta (y) / math Eta (x | y) Eta (x) ,/ math
| |
− |
| |
− |
| |
| | | |
| | | |
第577行: |
第465行: |
| Although the specific-conditional entropy <math>\Eta(X|Y=y)</math> can be either less or greater than <math>\Eta(X)</math> for a given [[random variate]] <math>y</math> of <math>Y</math>, <math>\Eta(X|Y)</math> can never exceed <math>\Eta(X)</math>. | | Although the specific-conditional entropy <math>\Eta(X|Y=y)</math> can be either less or greater than <math>\Eta(X)</math> for a given [[random variate]] <math>y</math> of <math>Y</math>, <math>\Eta(X|Y)</math> can never exceed <math>\Eta(X)</math>. |
| | | |
− | Although the specific-conditional entropy <math>\Eta(X|Y=y)</math> can be either less or greater than <math>\Eta(X)</math> for a given random variate <math>y</math> of <math>Y</math>, <math>\Eta(X|Y)</math> can never exceed <math>\Eta(X)</math>.
| |
| | | |
− | 虽然对于给定的随机变量 y / 数学 y / 数学,特定条件熵数学 Eta (x | y) / 数学可以比 math Eta (x) / 数学更小或更大,math Eta (x | y) / 数学永远不能超过 math Eta (x) / 数学。
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | == Conditional differential entropy ==
| |
| | | |
| == Conditional differential entropy == | | == Conditional differential entropy == |
− |
| |
− | 条件微分熵
| |
| | | |
| === Definition === | | === Definition === |
− |
| |
− | === Definition ===
| |
− |
| |
− | 定义
| |
| | | |
| The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called ''conditional differential (or continuous) entropy''. Let <math>X</math> and <math>Y</math> be a continuous random variables with a [[joint probability density function]] <math>f(x,y)</math>. The differential conditional entropy <math>h(X|Y)</math> is defined as<ref name=cover1991 />{{rp|249}} | | The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called ''conditional differential (or continuous) entropy''. Let <math>X</math> and <math>Y</math> be a continuous random variables with a [[joint probability density function]] <math>f(x,y)</math>. The differential conditional entropy <math>h(X|Y)</math> is defined as<ref name=cover1991 />{{rp|249}} |
− |
| |
− | The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called conditional differential (or continuous) entropy. Let <math>X</math> and <math>Y</math> be a continuous random variables with a joint probability density function <math>f(x,y)</math>. The differential conditional entropy <math>h(X|Y)</math> is defined as
| |
− |
| |
− | 上述定义适用于离散型随机变量。离散条件熵的连续形式称为条件微分(或连续)熵。让数学 x / math 和数学 y / math 是一个连续的随机变量和一个概率密度函数 / 数学 f (x,y) / math。微分 / 条件熵数学 h (x | y) / math 定义为
| |
− |
| |
− |
| |
| | | |
| | | |
| | | |
| {{Equation box 1 | | {{Equation box 1 |
− |
| |
− | {{Equation box 1
| |
− |
| |
− | {方程式方框1
| |
− |
| |
− | |indent =
| |
− |
| |
− | |indent =
| |
− |
| |
− | 不会有事的
| |
− |
| |
− | |title=
| |
− |
| |
− | |title=
| |
− |
| |
− | 标题
| |
− |
| |
− | |equation = {{NumBlk||<math>h(X|Y) = -\int_{\mathcal X, \mathcal Y} f(x,y)\log f(x|y)\,dx dy</math>|{{EquationRef|Eq.2}}}}
| |
− |
| |
− | |equation = }}
| |
− |
| |
− | 会公式开始
| |
− |
| |
− | |cellpadding= 6
| |
− |
| |
− | |cellpadding= 6
| |
− |
| |
− | 6号手术室
| |
− |
| |
− | |border
| |
− |
| |
− | |border
| |
− |
| |
− | 边界
| |
− |
| |
− | |border colour = #0073CF
| |
− |
| |
− | |border colour = #0073CF
| |
− |
| |
− | 0073CF
| |
− |
| |
− | |background colour=#F5FFFA}}
| |
− |
| |
− | |background colour=#F5FFFA}}
| |
− |
| |
− | 5 / fffa }
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | === Properties ===
| |
− |
| |
− | === Properties ===
| |
− |
| |
− | 属性
| |
− |
| |
− | In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative.
| |
− |
| |
− | In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative.
| |
− |
| |
− | 与离散随机变量的条件熵相反,条件微分熵可能是负的。
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | As in the discrete case there is a chain rule for differential entropy:
| |
− |
| |
− | As in the discrete case there is a chain rule for differential entropy:
| |
− |
| |
− | 在离散情况下,微分熵有一个链式规则:
| |
− |
| |
− | :<math>h(Y|X)\,=\,h(X,Y)-h(X)</math><ref name=cover1991 />{{rp|253}}
| |
− |
| |
− | <math>h(Y|X)\,=\,h(X,Y)-h(X)</math>
| |
− |
| |
− | 数学 h (y | x) ,h (x,y)-h (x) / math
| |
− |
| |
− | Notice however that this rule may not be true if the involved differential entropies do not exist or are infinite.
| |
− |
| |
− | Notice however that this rule may not be true if the involved differential entropies do not exist or are infinite.
| |
− |
| |
− | 然而,请注意,如果所涉及的微分熵不存在或者是无限的,那么这个规则可能不成立。
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | Joint differential entropy is also used in the definition of the [[mutual information]] between continuous random variables:
| |
− |
| |
− | Joint differential entropy is also used in the definition of the mutual information between continuous random variables:
| |
− |
| |
− | 联合微分熵也用于连续随机变量之间互信息的定义:
| |
− |
| |
− | :<math>\operatorname{I}(X,Y)=h(X)-h(X|Y)=h(Y)-h(Y|X)</math>
| |
− |
| |
− | <math>\operatorname{I}(X,Y)=h(X)-h(X|Y)=h(Y)-h(Y|X)</math>
| |
− |
| |
− | { i }(x,y) h (x)-h (x | y) h (y)-h (y | x) / math
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | <math>h(X|Y) \le h(X)</math> with equality if and only if <math>X</math> and <math>Y</math> are independent.<ref name=cover1991 />{{rp|253}}
| |
− |
| |
− | <math>h(X|Y) \le h(X)</math> with equality if and only if <math>X</math> and <math>Y</math> are independent.
| |
− |
| |
− | 数学 h (x | y) le h (x) / math with equality 当且仅当数学 x / math 和数学 y / math 是独立的。
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | ===Relation to estimator error===
| |
− |
| |
− | ===Relation to estimator error===
| |
− |
| |
− | 与估计误差的关系
| |
− |
| |
− | The conditional differential entropy yields a lower bound on the expected squared error of an [[estimator]]. For any random variable <math>X</math>, observation <math>Y</math> and estimator <math>\widehat{X}</math> the following holds:<ref name=cover1991 />{{rp|255}}
| |
− |
| |
− | The conditional differential entropy yields a lower bound on the expected squared error of an estimator. For any random variable <math>X</math>, observation <math>Y</math> and estimator <math>\widehat{X}</math> the following holds:
| |
− |
| |
− | 条件微分熵对估计量的期望平方误差产生一个下限。对于任何随机变量的数学 x / math,观察数学 y / math 和估计数学 x / math,下面的观点成立:
| |
− |
| |
− | :<math display="block">\mathbb{E}\left[\bigl(X - \widehat{X}{(Y)}\bigr)^2\right]
| |
− |
| |
− | <math display="block">\mathbb{E}\left[\bigl(X - \widehat{X}{(Y)}\bigr)^2\right]
| |
− |
| |
− | 数学显示块“左”[ bigl (x-widehat {(y)} bigr) ^ 2]
| |
− |
| |
− | \ge \frac{1}{2\pi e}e^{2h(X|Y)}</math>
| |
− |
| |
− | \ge \frac{1}{2\pi e}e^{2h(X|Y)}</math>
| |
− |
| |
− | (x | y)} / math
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | This is related to the [[uncertainty principle]] from [[quantum mechanics]].
| |
− |
| |
− | This is related to the uncertainty principle from quantum mechanics.
| |
− |
| |
− | 这与量子力学的不确定性原理有关。
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | ==Generalization to quantum theory==
| |
− |
| |
− | ==Generalization to quantum theory==
| |
− |
| |
− | 对量子理论的推广
| |
− |
| |
− | In [[quantum information theory]], the conditional entropy is generalized to the [[conditional quantum entropy]]. The latter can take negative values, unlike its classical counterpart.
| |
− |
| |
− | In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy. The latter can take negative values, unlike its classical counterpart.
| |
− |
| |
− | 在量子信息论中,条件熵被推广为条件量子熵。后者可以采取负值,不像它的古典对应物。
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | == See also ==
| |
− |
| |
− | == See also ==
| |
− |
| |
− | 参见
| |
− |
| |
− | * [[Entropy (information theory)]]
| |
− |
| |
− |
| |
− |
| |
− | * [[Mutual information]]
| |
− |
| |
− |
| |
− |
| |
− | * [[Conditional quantum entropy]]
| |
− |
| |
− |
| |
− |
| |
− | * [[Variation of information]]
| |
− |
| |
− |
| |
− |
| |
− | * [[Entropy power inequality]]
| |
− |
| |
− |
| |
− |
| |
− | * [[Likelihood function]]
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | ==References==
| |
− |
| |
− | ==References==
| |
− |
| |
− | 参考资料
| |
− |
| |
− | {{Reflist}}
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− |
| |
− | [[Category:Entropy and information]]
| |
| | | |
| Category:Entropy and information | | Category:Entropy and information |
第831行: |
第481行: |
| 类别: 熵和信息 | | 类别: 熵和信息 |
| | | |
− | [[Category:Information theory]]
| + | |indent = |
| | | |
| Category:Information theory | | Category:Information theory |