更改

条件熵 (查看源代码)

2020年10月25日 (日) 16:07的版本

删除9,252字节、 2020年10月25日 (日) 16:07

小

Moved page from wikipedia:en:Conditional entropy (history)

第1行：第1行： −

此词条暂由彩云小译翻译，未经人工整理和审校，带来阅读不便，请见谅。~~{{Short description|Measure of relative information in probability theory}}~~

+

此词条暂由彩云小译翻译，未经人工整理和审校，带来阅读不便，请见谅。

−

+

{{Short description|Measure of relative information in probability theory}}

−

第15行：第11行：

[[Venn diagram showing additive and subtractive relationships various information measures associated with correlated variables <math>X</math> and <math>Y</math>. The area contained by both circles is the joint entropy <math>\Eta(X,Y)</math>. The circle on the left (red and violet) is the individual entropy <math>\Eta(X)</math>, with the red being the conditional entropy <math>\Eta(X|Y)</math>. The circle on the right (blue and violet) is <math>\Eta(Y)</math>, with the blue being <math>\Eta(Y|X)</math>. The violet is the mutual information <math>\operatorname{I}(X;Y)</math>.]]

−

~~显示加减关系的文氏图各种信息测量与相关变量数学 x / 数学和 y / 数学相关。两个圆所包含的面积是联合熵 math Eta (x，y) / math。左边的圆圈~~(红色和紫色)~~是个体熵数学 Eta (x) / math，红色的是条件熵数学 Eta (x | y) / math。右边的圆~~(蓝色和紫色)是 math Eta (y) / ~~math，蓝色的是~~ math Eta (y | x) / ~~math。紫色是互信息~~ math ~~operatorname~~ { i }(x; y) / math. ]

+

文恩图显示了相加和相减的关系，各种信息测量与相关变量相关。两个圆圈所包含的区域是联合熵。左边的圆圈(红色和紫色)代表个体熵。左边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体熵。右边的圆圈代表个体。右边的圆圈(蓝色和紫色)是 < math > Eta (y) </math > ，蓝色的是 < math > Eta (y | x) </math > 。紫色是共同的信息[ math > 操作者名称{ i }(x; y) </math > ]

+

In [[information theory]], the '''conditional entropy''' quantifies the amount of information needed to describe the outcome of a [[random variable]] <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in [[Shannon (unit)|shannon]]s, [[Nat (unit)|nat]]s, or [[Hartley (unit)|hartley]]s. The ''entropy of <math>Y</math> conditioned on <math>X</math>'' is written as <math>\Eta(Y|X)</math>.

+

In information theory, the conditional entropy quantifies the amount of information needed to describe the outcome of a random variable <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in shannons, nats, or hartleys. The entropy of <math>Y</math> conditioned on <math>X</math> is written as <math>\Eta(Y|X)</math>.

−

In [[information theory]], the '''conditional entropy''' (or '''equivocation''') quantifies the amount of information needed to describe the outcome of a [[random variable]] <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in [[Shannon (unit)|shannon]]s, [[Nat (unit)|nat]]s, or [[Hartley (unit)|hartley]]s. The ''entropy of <math>Y</math> conditioned on <math>X</math>'' is written as <math>\Eta(Y|X)</math>.

+

在信息论中，如果另一个随机变量的值是已知的，那么条件熵就会量化描述一个随机变量的结果所需的信息量。在这里，信息是用夏农、纳特斯或哈特利来衡量的。“数学”的熵取决于“数学” ，“ x”表示“数学” ，“埃塔”表示“数学”。

−

In information theory, the conditional entropy (or equivocation) quantifies the amount of information needed to describe the outcome of a random variable <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in shannons, nats, or hartleys. The entropy of <math>Y</math> conditioned on <math>X</math> is written as <math>\Eta(Y|X)</math>.

−

在信息论中，假设另一个随机变量 math x / math 的值是已知的，信息条件熵量化描述随机变量 math y / math 的结果所需的信息量。在这里，信息是用夏农、纳特斯或哈特利来衡量的。数学 y / 数学的熵以数学 x / 数学为条件，表示为数学 Eta (y | x) / 数学。

−

~~== Definition ==~~

== Definition ==

−

定义

The conditional entropy of <math>Y</math> given <math>X</math> is defined as

第41行：第29行：

The conditional entropy of <math>Y</math> given <math>X</math> is defined as

−

~~数学 y / 数学给定数学~~ x ~~/ 数学的条件熵定义为~~

+

给定的 x 条件熵被定义为

−

第57行：第43行：

|indent =

−

~~不会有事的~~

+

2012年10月22日

|title=

第63行：第49行：

|title=

−

标题

+

2012年10月11日

|equation = {{NumBlk||<math>\Eta(Y|X)\ = -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)}</math>|{{EquationRef|Eq.1}}}}

第69行：第55行：

|equation = }}

−

~~会公式开始~~

+

| equation = }

|cellpadding= 6

第75行：第61行：

|cellpadding= 6

−

~~6号手术室~~

+

6

|border

第93行：第79行：

|background colour=#F5FFFA}}

−

5 / fffa }

+

5/fffa }}

−

第103行：第87行：

where <math>\mathcal X</math> and <math>\mathcal Y</math> denote the support sets of <math>X</math> and <math>Y</math>.

−

其中 math ~~mathcal~~ x / math 和 math ~~mathcal~~ y / math ~~表示数学~~ x / math ~~和数学~~ y / math 的支持集。

+

这里 < math > 数学 x </math > 和 < math > > 数学 y </math > 表示 < math > x </math > 和 < math > y </math > 的支持集。

−

第111行：第93行：

''Note:'' It is conventioned that the expressions <math>0 \log 0</math> and <math>0 \log c/0</math> for fixed <math>c > 0</math> should be treated as being equal to zero. This is because <math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math> and <math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref>

−

Note: It is conventioned that the expressions <math>0 \log 0</math> and <math>0 \log c/0</math> for fixed <math>c > 0</math> should be treated as being equal to zero. This is because <math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math> and <math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math> ~~~~

+

Note: It is conventioned that the expressions <math>0 \log 0</math> and <math>0 \log c/0</math> for fixed <math>c > 0</math> should be treated as being equal to zero. This is because <math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math> and <math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math>

−

注意: 对于固定数学 c 0 / math，表达式 math 0 log 0 / math 和 math 0 log c / 0 / math 应当被视为等于零。这是因为 math theta 0 ^ + theta log theta 0 / math 和 math theta 0 ^ + theta log theta 0 / math！ -- 因为 p (x，y)仍然可以等于0，即使 p (x) ！ 0和 p (y) ！ 0.P (x，y) p (x)0怎么样？-->

−

+

注意: 常规的表达式 < math > 0 log 0 </math > 和 < math > 0 log c/0 </math > 对于 fixed < math > c > 0 </math > 应该被视为等于零。这是因为 < math > lim { theta to0 ^ + } theta，log，c/theta = 0 </math > 和 < math > lim { theta to0 ^ + } theta，log theta = 0 </math >

第121行：第101行：

Intuitive explanation of the definition :

−

~~Intuitive explanation of the definition :~~

+

The chain rule follows from the above definition of conditional entropy:

−

~~对定义的直观解释:~~

−

According to the definition, <math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math> where <math>\displaystyle f:( x,y) \ \rightarrow -\log_{2}( \ p( y|x) \ ) .</math> <math>\displaystyle f</math> associates to <math>\displaystyle ( x,y)</math> the information content of <math>\displaystyle ( Y=y)</math> given <math>\displaystyle (X=x)</math>, which is the amount of information needed to describe the event <math>\displaystyle (Y=y)</math> given <math>(X=x)</math>. According to the law of large numbers, <math>\displaystyle H(Y|X)</math> is the arithmetic mean of a large number of independent realizations of <math>\displaystyle f(X,Y)</math>.

−

~~According to~~ the definition, <math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math> where <math>\displaystyle f:( x,y) \ \rightarrow -\log_{2}( \ p( y|x) \ ) .</math> <math>\displaystyle f</math> associates to <math>\displaystyle ( x,y)</math> the information content of <math>\displaystyle ( Y=y)</math> given <math>\displaystyle (X=x)</math>, which is the amount of information needed to describe the event <math>\displaystyle (Y=y)</math> given <math>(X=x)</math>. According to the law of large numbers, <math>\displaystyle H(Y|X)</math> is the arithmetic mean of a large number of independent realizations of ~~<math>\displaystyle f(X,Y)</math>.~~

−

~~根据定义，math displaystyle h (y | x) mathbb { e }( f (x，y)) / math displaystyle f~~: (x，y) righttarrow log {2}( p (y | x))。 / math displaystyle f / math 联想到 math displaystyle (x，y) / math 数学数学 displaystyle (y) / math 给定的 math displaystyle (x) / math，这是描述事件数学 displaystyle (y) / math 给定的 math (x) / math 所需的信息量。根据大数定律，math displaystyle h (y | x) / math 是 math displaystyle f (x，y) / math 的大量独立实现的算术平均数。

−

+

链式规则遵循了上述条件熵的定义:

+

According to the definition, <math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math> where <math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) .</math> <math>\displaystyle f</math> associates to <math>\displaystyle ( x,y)</math> the information content of <math>\displaystyle ( Y=y)</math> given <math>\displaystyle (X=x)</math>, which is the amount of information needed to describe the event <math>\displaystyle (Y=y)</math> given <math>(X=x)</math>. According to the law of large numbers, <math>\displaystyle H(Y|X)</math> is the arithmetic mean of a large number of independent realizations of <math>\displaystyle f(X,Y)</math>.

+

<math>\begin{align}

+

1.1.1.2.2.2.2.2.2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.3.3.3.3.3.3.3.3.3.3.3.4.3.3.3.3.3.3.3.3.3

== Motivation ==

−

=~~= Motivation ==~~

+

\Eta(Y|X) &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \left(\frac{p(x)}{p(x,y)} \right) \\[4pt]

−

动机

+

Eta (y | x) & = sum _ { x in mathcal x，y in mathcal y } p (x，y) log left (frac { p (x)}{ p (x，y)} right)[4 pt ]

Let <math>\Eta(Y|X=x)</math> be the [[Shannon Entropy|entropy]] of the discrete random variable <math>Y</math> conditioned on the discrete random variable <math>X</math> taking a certain value <math>x</math>. Denote the support sets of <math>X</math> and <math>Y</math> by <math>\mathcal X</math> and <math>\mathcal Y</math>. Let <math>Y</math> have [[probability mass function]] <math>p_Y{(y)}</math>. The unconditional entropy of <math>Y</math> is calculated as <math>\Eta(Y) := \mathbb{E}[\operatorname{I}(Y)]</math>, i.e.

−

~~Let <math>~~\~~Eta(Y|X=~~x)</math> be the entropy of the discrete random variable <math>Y</math> conditioned on the discrete random variable <math>X</math> taking a certain value <math>x</math>. Denote the support sets of <math>X</math> and <math>Y</math> by <math>\mathcal X~~</math> and <math>~~\mathcal Y~~</math>. Let <math>Y</math> have probability mass function <math>p_Y{~~(y)~~}</math>. The unconditional entropy of <math>Y</math> is calculated as <math>~~\~~Eta~~(Y) := \~~mathbb{E}[\operatorname{I}~~(Y)]~~</math>, i.e.~~

+

&= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)(\log (p(x))-\log (p(x,y))) \\[4pt]

−

~~设数学是离散型随机变量数学 y / math 的熵，条件是离散型随机变量数学 x / math 取一定值数学~~ x ~~/ math。用 math~~ mathcal ~~x / math 和 math~~ mathcal y ~~/ math 表示数学 x / math 和数学 y / math 的支持集。让数学 y / 数学有概率质量函数 / 数学~~ p {(y)~~} / 数学。数学 y / math 的无条件熵计算为 math Eta~~ (y) ~~: mathbb { e }[ operatorname { i }~~(y)] ~~/ math，即。~~

+

& = sum _ { x in mathcal x，y in mathcal y } p (x，y)(log (p (x))-log (p (x，y)))[4 pt ]

+

&= -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log (p(x,y)) + \sum_{x\in\mathcal X, y\in\mathcal Y}{p(x,y)\log(p(x))} \\[4pt]

+

& =-sum _ { x in mathcal x，y in mathcal y } p (x，y) log (p (x，y)) + sum _ { x in mathcal x，y in mathcal y }{ p (x，y) log (p (x))}[4 pt ]

:<math>\Eta(Y) = \sum_{y\in\mathcal Y} {\mathrm{Pr}(Y=y)\,\mathrm{I}(y)}

−

~~<math>~~\Eta(Y) = \sum_{y\in\mathcal Y} {\~~mathrm{Pr}~~(~~Y=y~~)\,\~~mathrm{I}(y)}~~

+

& = \Eta(X,Y) + \sum_{x \in \mathcal X} p(x)\log (p(x) ) \\[4pt]

−

~~数学中的~~ Eta (y)(y)(~~y)(y)(y)~~(y)(y)

+

& = Eta (x，y) + sum _ { x in mathcal x } p (x) log (p (x))[4 pt ]

= -\sum_{y\in\mathcal Y} {p_Y(y) \log_2{p_Y(y)}},</math>

−

= -\~~sum_{y\in\mathcal~~ Y~~} {p_Y(y~~) \~~log_2{p_Y~~(y)~~}},</math>~~

+

& = \Eta(X,Y) - \Eta(X).

−

~~- 和数学 y }{ py~~ (y) ~~log 2{ py~~ (y)~~} ，/ math~~

+

& = Eta (x，y)-Eta (x).

+

\end{align}</math>

+

结束{ align } </math >

where <math>\operatorname{I}(y_i)</math> is the [[information content]] of the [[Outcome (probability)|outcome]] of <math>Y</math> taking the value <math>y_i</math>. The entropy of <math>Y</math> conditioned on <math>X</math> taking the value <math>x</math> is defined analogously by [[conditional expectation]]:

−

where <math>\operatorname{I}(y_i)</math> is the information content of the outcome of <math>Y</math> taking the value <math>y_i</math>. The entropy of <math>Y</math> conditioned on <math>X</math> taking the value <math>x</math> is defined analogously by conditional expectation:

−

其中 math operatorname { i }(yi) / math 是取值 math y / math 的数学 y / math 结果的信息内容。数学 y / 数学的熵取决于数学 x / 数学的取值，数学 x / 数学的定义类似于条件期望的定义:

+

In general, a chain rule for multiple random variables holds:

+

一般来说，多个随机变量的链式规则适用于:

:<math>\Eta(Y|X=x)

−

~~<math>~~\~~Eta~~(Y|X=x)

+

= -\sum_{y\in\mathcal Y} {\Pr(Y = y|X=x) \log_2{\Pr(Y = y|X=x)}}.</math>

−

(~~y | x~~)

+

<math> \Eta(X_1,X_2,\ldots,X_n) =

−

~~= -\sum_{y\in\mathcal Y} {\Pr~~(~~Y = y|X=x~~) ~~\log_2{\Pr(Y~~ = ~~y|X=x)}}. </math>~~

+

< math > Eta (x1，x2，ldots，xn) =

−

~~= -\sum_{y\in\mathcal Y} {\Pr(Y = y|X=x) \log_2{\Pr(Y = y|X=x)}}. </math>~~

+

Note that <math>\Eta(Y|X)</math> is the result of averaging <math>\Eta(Y|X=x)</math> over all possible values <math>x</math> that <math>X</math> may take. Also, if the above sum is taken over a sample <math>y_1, \dots, y_n</math>, the expected value <math>E_X[ \Eta(y_1, \dots, y_n \mid X = x)]</math> is known in some domains as '''equivocation'''.<ref>{{cite journal|author1=Hellman, M.|author2=Raviv, J.|year=1970|title=Probability of error, equivocation, and the Chernoff bound|journal=IEEE Transactions on Information Theory|volume=16|issue=4|pp=368-372}}</ref>

−

~~- 和数学 y }{ Pr (y | x) log 2{ Pr (y | x)}。数学~~

−

<math>\Eta(Y|X)</math> is the result of averaging <math>\Eta(Y|X=x)</math> over all possible values <math>x</math> that <math>X</math> may take.

−

<math>\~~Eta(Y|X)~~</math> is the ~~result of averaging~~ <math>\Eta(Y|X=x)</math> ~~over all possible values~~ <~~math~~>x</~~math~~> ~~that <math>X</math> may take.~~

−

~~数学是对所有可能的数值求平均值的结果，数学 x / 数学可能需要。~~

+

\sum_{i=1}^n \Eta(X_i | X_1, \ldots, X_{i-1}) </math>

+

Sum { i = 1} ^ n Eta (x _ i | x _ 1，ldots，x _ { i-1}) </math >

第209行：第175行：

Given [[Discrete random variable|discrete random variables]] <math>X</math> with image <math>\mathcal X</math> and <math>Y</math> with image <math>\mathcal Y</math>, the conditional entropy of <math>Y</math> given <math>X</math> is defined as the weighted sum of <math>\Eta(Y|X=x)</math> for each possible value of <math>x</math>, using <math>p(x)</math> as the weights:<ref name=cover1991>{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|15}}

−

~~Given discrete random variables <math>X</math> with image <math>\mathcal X</math> and <math>Y</math> with image <math>\mathcal Y</math>~~, ~~the conditional entropy~~ of ~~<math>Y</math> given <math>X</math>~~ is ~~defined as the weighted sum of <math>\Eta(Y|X=x)</math> for each possible value of <math>x</math>, using <math>p(x)</math> as the weights:~~

+

It has a similar form to chain rule in probability theory, except that addition instead of multiplication is used.

−

给定离散随机变量数学 x / 数学 x / 数学 x / 数学 y / 数学 y / 数学，数学 y / 数学 x / 数学的条件熵定义为数学 x / 数学每个可能值的加权和，用数学 p (x) / 数学作为权重:

−

+

除了用加法代替乘法之外，它的形式与概率论的链式法则相似。

:<math>

−

~~<math>~~

−

数学

\begin{align}

−

~~\begin{align}~~

+

Bayes' rule for conditional entropy states

−

~~Begin { align }~~

+

条件熵的贝叶斯规则

\Eta(Y|X)\ &\equiv \sum_{x\in\mathcal X}\,p(x)\,\Eta(Y|X=x)\\

−

\Eta(Y|X)~~\ &\equiv \sum_{x\in\mathcal X}~~\,~~p(x)~~\,\Eta(Y~~|X=x~~)\\

+

−

~~数学 x 中的~~ Eta (~~y | x~~) ，p (~~x) ，Eta (y | x~~)

−

& =~~-\sum_{x\in\mathcal X} p~~(x)~~\sum_{y\in\mathcal Y}\,p~~(y|x)~~\,\log\, p~~(y|x)\\

+

[数学] Eta (y | x) ，= ，Eta (x | y)-Eta (x) + Eta (y)

& =-\sum_{x\in\mathcal X} p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\, p(y|x)\\

−

~~数学 x } p (x) sum y } ，p (y | x) ，log，p (y | x)~~

& =-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(x,y)\,\log\,p(y|x)\\

−

& =-\~~sum_{x~~\~~in\mathcal~~ X~~}\sum_{y\in\mathcal~~ Y}\,p(x,y)\~~,\log\,p~~(~~y|x~~)\\

+

Proof. <math>\Eta(Y|X) = \Eta(X,Y) - \Eta(X)</math> and <math>\Eta(X|Y) = \Eta(Y,X) - \Eta(Y)</math>. Symmetry entails <math>\Eta(X,Y) = \Eta(Y,X)</math>. Subtracting the two equations implies Bayes' rule.

−

~~数学中的 x 和 y，p~~ (~~x，y~~) ~~，log，p~~ (~~y | x~~)

−

& =-~~\sum_{~~x~~\in\mathcal X, y\in\mathcal Y}p~~(x,y)~~\log\,p~~(y|x)\\

+

证据。Eta (y | x) = Eta (x，y)-Eta (x) | math > Eta (x | y) = Eta (y，x)-Eta (y).对称意味着 Eta (x，y) = Eta (y，x)。减去这两个方程就得到了贝叶斯定律。

& =-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(y|x)\\

−

~~数学 x，y = p (x，y) log，p (y | x)~~

& =-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)}. \\

−

~~& =-\sum_{x\in\mathcal~~ X~~, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)}. \\~~

+

If <math>Y</math> is conditionally independent of <math>Z</math> given <math>X</math> we have:

−

~~(x，y) log-frac { p (x，y)}{ p (x，y)}.\\~~

+

如果[数学] y </math > 是条件独立于[数学] z </math > 给定 < 数学 > x </math > 我们有:

−

~~& = \sum_{x\in\mathcal X,~~ y~~\in\mathcal Y}p(~~x~~,y)\log \frac {p(x)} {p(x,y)}. \\~~

& = \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}. \\

−

~~(x，y) log frac { p (x)}{ p (x，y)}.\\~~

\end{align}

−

\~~end{align}~~

+

−

~~End { align }~~

+

[ math ] Eta (y | x，z) ，= ，Eta (y | x)

</math>

−

~~</math>~~

−

数学

−

第283行：第225行：

−

~~~~

+

For any <math>X</math> and <math>Y</math>:

−

<~~！ -- 本段不正确; 最后一行不是任何两个分布之间的 KL 散度，因为 p (x)[一般]不是~~ x 和 y ~~域上的有效分布。上面的最后一个公式是 Kullback-Leibler 的背离，也被称为相对熵。相对熵总是正的，只有当且仅当数学 p (x，y) p (x)~~ / math ~~时才消失。这是当我们知道数学 x / 数学告诉我们关于数学 y / 数学的一切。补充~~: ~~这个评论是否过时了，因为 KL 的分歧没有在上面提到？2014年11月--~~

+

对于任意的 < math > x </math > 和 < math > > y </math > :

+

<math display="block">\begin{align}

+

(数学显示 = “ block” > begin { align })

==Properties==

−

~~==Properties==~~

+

\Eta(Y|X) &\le \Eta(Y) \, \\

−

属性

+

埃塔(y | x)及埃塔(y) ,

===Conditional entropy equals zero===

−

=~~==Conditional entropy equals zero===~~

+

\Eta(X,Y) &= \Eta(X|Y) + \Eta(Y|X) + \operatorname{I}(X;Y),\qquad \\

−

~~条件熵等于零~~

+

eta (x，y) & = Eta (x | y) + Eta (y | x) + 操作数名{ i }(x; y) ，qquad

<math>\Eta(Y|X)=0</math> if and only if the value of <math>Y</math> is completely determined by the value of <math>X</math>.

−

~~<math>~~\Eta(Y|X)~~=0</math> if and only if the value of <math>~~Y~~</math> is completely determined by the value of <math>~~X~~</math>.~~

+

\Eta(X,Y) &= \Eta(X) + \Eta(Y) - \operatorname{I}(X;Y),\, \\

−

~~Math~~ Eta (~~y |~~ x)~~0 / math 当且仅当 math~~ y ~~/ math 的值完全由 math~~ x ~~/ math 的值决定。~~

+

Eta (x，y) & = Eta (x) + Eta (y)-操作员名称{ i }(x; y) ，,

+

\operatorname{I}(X;Y) &\le \Eta(X),\,

+

操作者名{ i }(x; y) & le Eta (x) ，,

===Conditional entropy of independent random variables===

−

~~===Conditional entropy of independent random variables===~~

+

\end{align}</math>

−

~~独立随机变量的条件熵~~

+

结束{ align } </math >

Conversely, <math>\Eta(Y|X) = \Eta(Y)</math> if and only if <math>Y</math> and <math>X</math> are [[independent random variables]].

−

~~Conversely, <math>\Eta(Y|X) = \Eta(Y)</math> if and only if <math>Y</math> and <math>X</math> are independent random variables.~~

−

~~相反，math Eta (y | x) Eta (y) / math 当且仅当 math y / math 和 math x / math 是独立随机变量。~~

+

where <math>\operatorname{I}(X;Y)</math> is the mutual information between <math>X</math> and <math>Y</math>.

+

其中，“数学”和“数学”之间的相互信息。

===Chain rule===

−

~~===Chain rule===~~

−

~~链式规则~~

Assume that the combined system determined by two random variables <math>X</math> and <math>Y</math> has [[joint entropy]] <math>\Eta(X,Y)</math>, that is, we need <math>\Eta(X,Y)</math> bits of information on average to describe its exact state. Now if we first learn the value of <math>X</math>, we have gained <math>\Eta(X)</math> bits of information. Once <math>X</math> is known, we only need <math>\Eta(X,Y)-\Eta(X)</math> bits to describe the state of the whole system. This quantity is exactly <math>\Eta(Y|X)</math>, which gives the ''chain rule'' of conditional entropy:

−

~~Assume that the combined system determined by two random variables~~ <math>X</math> and <math>Y</math> has joint entropy <math>\Eta(X,Y)</math>, that is, we need <math>\Eta(X,Y)</math> bits of information on average to describe its exact state. Now if we first learn the value of <math>X</math>, we have gained <math>\Eta(X)</math> bits of information. Once <math>X</math> is known, we only need <math>\Eta(X,Y)-\Eta(X)</math> bits to describe the state of the whole system. This quantity is exactly <math>\Eta(Y|X)</math>, which gives the chain rule of conditional entropy:

+

For independent <math>X</math> and <math>Y</math>:

−

假设由两个随机变量数学 x / math 和数学 y / math 组成的组合系统具有联合熵数学 Eta (x，y) / math，也就是说，我们平均需要 math Eta (x，y) / math 位信息来描述它的精确状态。现在，如果我们首先学习数学 x / math 的值，我们就得到了数学 Eta (x) / 数学信息位。一旦知道了数学 x / math，我们只需要 math Eta (x，y)- Eta (x) / math 位来描述整个系统的状态。这个量正是 math Eta (y | x) / math，它给出了条件熵的链式法则:

−

+

对于独立的《数学》和《数学》 :

第347行：第285行：

:<math>\Eta(Y|X)\, = \, \Eta(X,Y)- \Eta(X).</math><ref name=cover1991 />{{rp|17}}

−

+

−

~~Math~~ Eta (y | x) ， Eta (~~x，y~~)~~- Eta (x) .~~ / math

−

+

Eta (y | x) = Eta (y) </math > and < math > Eta (x | y) = Eta (x) ，</math >

第357行：第293行：

The chain rule follows from the above definition of conditional entropy:

−

~~The chain rule follows from~~ the ~~above definition of~~ conditional entropy:

+

Although the specific-conditional entropy <math>\Eta(X|Y=y)</math> can be either less or greater than <math>\Eta(X)</math> for a given random variate <math>y</math> of <math>Y</math>, <math>\Eta(X|Y)</math> can never exceed <math>\Eta(X)</math>.

−

~~链式规则遵循以上条件熵的定义:~~

−

+

虽然对于给定的随机变量来说，特定条件熵的 Eta (x | y = y) </math > </math > 可能比 </math > Eta (x) </math > </math > ，< math > Eta (x | y) </math > 不能超过 math > Eta (x) </math > 。

:<math>\begin{align}

−

~~<math>\begin{align}~~

−

~~数学 begin { align }~~

\Eta(Y|X) &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \left(\frac{p(x)}{p(x,y)} \right) \\[4pt]

−

~~\Eta(Y|X)~~ &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log ~~\left~~(~~\frac{~~p(x)}{p(x,y)~~} \right~~) \\[4pt]

+

&= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)(\log (p(x))-\log (p(x,y))) \\[4pt]

−

~~Eta~~ (~~y | x) & sum (x，y~~) ~~p (x，y) log 左(frac~~ (x)~~} p~~ (~~x，y)右~~)~~[4 pt ]~~

+

The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called conditional differential (or continuous) entropy. Let <math>X</math> and <math>Y</math> be a continuous random variables with a joint probability density function <math>f(x,y)</math>. The differential conditional entropy <math>h(X|Y)</math> is defined as

−

~~&= -\sum_{x\in\mathcal X, y\in\mathcal Y}p~~(x~~,y)\log (p~~(~~x,y)~~) ~~+ \sum_{x\in\mathcal X, y\in\mathcal Y}{p~~(x,y)~~\log(p(x))} \\[4pt]~~

+

上面的定义适用于离散随机变量。离散条件熵的连续形式称为条件微分(或连续)熵。设 x 是连续随机变量，f (x，y)是连续随机概率密度函数。微分条件熵 < math > h (x | y) </math > 被定义为

&= -\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log (p(x,y)) + \sum_{x\in\mathcal X, y\in\mathcal Y}{p(x,y)\log(p(x))} \\[4pt]

−

~~数学 x，y 中数学 y } p (x，y) log (p (x，y)) + 数学 x，y 中数学 y } p (x，y) log (p (x))[4 pt ]~~

& = \Eta(X,Y) + \sum_{x \in \mathcal X} p(x)\log (p(x) ) \\[4pt]

−

~~& = \Eta(X,Y) + \sum_~~{~~x \in \mathcal X} p(x)\log (p(x) ) \\[4pt]~~

+

{{Equation box 1

−

~~& Eta (x，y) + sum~~ { ~~x } p (x) log (p (x))[4 pt ]~~

+

{方程式方框1

& = \Eta(X,Y) - \Eta(X).

−

& = ~~\Eta(X,Y) - \Eta(X).~~

+

|indent =

−

~~& Eta (x，y)- Eta (x).~~

+

2012年10月22日

\end{align}</math>

−

~~\end{align}</math>~~

+

|title=

−

~~End { align } / math~~

+

2012年10月11日

+

|equation = }}

+

| equation = }

In general, a chain rule for multiple random variables holds:

−

~~In general, a chain rule for multiple random variables holds:~~

+

|cellpadding= 6

−

~~一般来说，多个随机变量的链式规则适用于:~~

+

6

+

|border

+

边界

:<math> \Eta(X_1,X_2,\ldots,X_n) =

−

~~<math> \Eta(X_1,X_2,\ldots,X_n)~~ =

+

|border colour = #0073CF

−

~~Math Eta (x1，x2， ldots，xn)~~

+

0073CF

\sum_{i=1}^n \Eta(X_i | X_1, \ldots, X_{i-1}) </math><ref name=cover1991 />{{rp|22}}

−

~~\sum_{i~~=1}~~^n \Eta(X_i | X_1, \ldots, X_{i-1}) </math>~~

+

|background colour=#F5FFFA}}

−

~~{ i } ^ n Eta (xi | x1，ldots，x { i-1~~}~~) / math~~

−

+

5/fffa }}

第433行：第363行：

It has a similar form to [[Chain rule (probability)|chain rule]] in probability theory, except that addition instead of multiplication is used.

−

~~It has a similar form to chain rule in probability theory, except that addition instead of multiplication is used.~~

−

~~它有一个类似的形式链规则在概率论，除了加法代替乘法是使用。~~

+

In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative.

−

+

与离散随机变量的条件熵相反，条件微分熵可能是负的。

−

~~===Bayes' rule===~~

===Bayes' rule===

−

~~贝叶斯规则~~

[[Bayes' rule]] for conditional entropy states

−

~~Bayes'~~ rule for ~~conditional~~ entropy ~~states~~

+

As in the discrete case there is a chain rule for differential entropy:

−

~~条件熵的贝叶斯规则~~

+

在离散情况下，微分熵有一个链式规则:

:<math>\Eta(Y|X) \,=\, \Eta(X|Y) - \Eta(X) + \Eta(Y).</math>

−

+

−

~~Math Eta~~ (y | x) ， ~~Eta~~ (~~x | y~~)- ~~Eta~~ (x) ~~+ Eta (y) . / math~~

+

H (y | x) ，= ，h (x，y)-h (x)

+

Notice however that this rule may not be true if the involved differential entropies do not exist or are infinite.

+

然而，请注意，如果所涉及的微分熵不存在或者是无限的，那么这个规则可能不成立。

''Proof.'' <math>\Eta(Y|X) = \Eta(X,Y) - \Eta(X)</math> and <math>\Eta(X|Y) = \Eta(Y,X) - \Eta(Y)</math>. Symmetry entails <math>\Eta(X,Y) = \Eta(Y,X)</math>. Subtracting the two equations implies Bayes' rule.

−

Proof. <math>\Eta(Y|X) = \Eta(X,Y) - \Eta(X)</math> and <math>\Eta(X|Y) = \Eta(Y,X) - \Eta(Y)</math>. Symmetry entails <math>\Eta(X,Y) = \Eta(Y,X)</math>. Subtracting the two equations implies Bayes' rule.

−

证据。Math Eta (y | x) Eta (x，y)- Eta (x) / math Eta (x | y) Eta (y，x)- Eta (y) / math.对称性需要数学 Eta (x，y) Eta (y，x) / 数学。减去这两个方程就得到了贝叶斯定律。

+

Joint differential entropy is also used in the definition of the mutual information between continuous random variables:

+

联合微分熵也用于连续随机变量之间互信息的定义:

If <math>Y</math> is [[Conditional independence|conditionally independent]] of <math>Z</math> given <math>X</math> we have:

−

If <math>Y~~</math> is conditionally independent of <math>Z</math> given <math>~~X</math> ~~we have:~~

+

<math>\operatorname{I}(X,Y)=h(X)-h(X|Y)=h(Y)-h(Y|X)</math>

−

~~如果数学 y / 数学是条件独立于数学 z / 数学给定的数学 x / 数学，我们有:~~

−

+

(x，y) = h (x)-h (x | y) = h (y)-h (y | x) </math >

第485行：第407行：

:<math>\Eta(Y|X,Z) \,=\, \Eta(Y|X).</math>

−

+

<math>h(X|Y) \le h(X)</math> with equality if and only if <math>X</math> and <math>Y</math> are independent.

−

~~Math Eta (~~y ~~| x，z) ， Eta (y | x) .~~ / math

+

当且仅当 < math > x </math > 和 < math > y </math > 是独立的。

−

~~===Other properties===~~

===Other properties===

−

~~其他物业~~

For any <math>X</math> and <math>Y</math>:

−

For any <math>X</math> and <math>Y</math>:

+

The conditional differential entropy yields a lower bound on the expected squared error of an estimator. For any random variable <math>X</math>, observation <math>Y</math> and estimator <math>\widehat{X}</math> the following holds:

−

~~对于任何数学~~ x / ~~数学 y / 数学~~:

+

条件微分熵对估计量的期望平方误差产生一个下限。对于任何一个随机变量，观察值 < math > y </math > 和估计值 < math > widedhat { x } </math > ，下面是:

:<math display="block">\begin{align}

−

<math display="block">\~~begin~~{~~align~~}

+

<math display="block">\mathbb{E}\left[\bigl(X - \widehat{X}{(Y)}\bigr)^2\right]

−

~~数学显示“ block” begin~~ { ~~align~~ }

+

< math display = " block" > mathbb { e } left [ bigl (x-widehat { x }{(y)} bigr) ^ 2 right ]

\Eta(Y|X) &\le \Eta(Y) \, \\

−

\~~Eta(Y|X) &~~\le \~~Eta~~(Y) ~~\, \\~~

+

\ge \frac{1}{2\pi e}e^{2h(X|Y)}</math>

−

~~三、 Eta~~ (y | ~~x)和 le Eta (~~y) ,

+

1}{2 pi e } e ^ {2 h (x | y)} </math >

\Eta(X,Y) &= \Eta(X|Y) + \Eta(Y|X) + \operatorname{I}(X;Y),\qquad \\

−

~~\Eta(X,Y) &= \Eta(X|Y) + \Eta(Y|X) + \operatorname{I}(X;Y),\qquad \\~~

−

~~(x，y) & Eta (x | y) + Eta (y | x) + operatorname { i }(x; y) ,~~

\Eta(X,Y) &= \Eta(X) + \Eta(Y) - \operatorname{I}(X;Y),\, \\

−

~~\Eta(X,Y) &= \Eta(X) + \Eta(Y) - \operatorname{I}(X;Y),\, \\~~

+

This is related to the uncertainty principle from quantum mechanics.

−

~~Eta (x，y) & Eta (x) + Eta (y)-操作者名称{ i }(x; y) ，，~~

+

这与量子力学的不确定性原理有关。

−

~~\operatorname{I}(X;Y) &\le \Eta(X),\,~~

\operatorname{I}(X;Y) &\le \Eta(X),\,

−

~~{ i }(x; y) & le Eta (x) ， ,~~

−

~~\end{align}</math>~~

\end{align}</math>

−

~~End { align } / math~~

+

In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy. The latter can take negative values, unlike its classical counterpart.

+

在量子信息论中，条件熵被推广为条件量子熵。后者可以采取负值，不像它的古典对应物。

where <math>\operatorname{I}(X;Y)</math> is the [[mutual information]] between <math>X</math> and <math>Y</math>.

−

~~where <math>\operatorname{I}(X;Y)</math> is the mutual information between <math>X</math> and <math>Y</math>.~~

−

~~其中 math operatorname { i }(x; y) / math 是 math x / math 和 math y / math 之间的相互信息。~~

−

~~For independent <math>X</math> and <math>Y</math>:~~

For independent <math>X</math> and <math>Y</math>:

−

~~对于独立数学 x / 数学 y / 数学:~~

−

:<math>\Eta(Y|X) = \Eta(Y) </math> and <math>\Eta(X|Y) = \Eta(X) \, </math>

−

~~<math>\Eta(Y|X) = \Eta(Y) </math> and <math>\Eta(X|Y) = \Eta(X) \, </math>~~

−

~~Math Eta (y | x) Eta (y) / math Eta (x | y) Eta (x) ，/ math~~

−

第577行：第465行：

Although the specific-conditional entropy <math>\Eta(X|Y=y)</math> can be either less or greater than <math>\Eta(X)</math> for a given [[random variate]] <math>y</math> of <math>Y</math>, <math>\Eta(X|Y)</math> can never exceed <math>\Eta(X)</math>.

−

Although the specific-conditional entropy <math>\Eta(X|Y=y)</math> can be either less or greater than <math>\Eta(X)</math> for a given random variate <math>y</math> of <math>Y</math>, <math>\Eta(X|Y)</math> can never exceed <math>\Eta(X)</math>.

−

虽然对于给定的随机变量 y / 数学 y / 数学，特定条件熵数学 Eta (x | y) / 数学可以比 math Eta (x) / 数学更小或更大，math Eta (x | y) / 数学永远不能超过 math Eta (x) / 数学。

−

~~== Conditional differential entropy ==~~

== Conditional differential entropy ==

−

~~条件微分熵~~

=== Definition ===

−

~~=== Definition ===~~

−

定义

The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called ''conditional differential (or continuous) entropy''. Let <math>X</math> and <math>Y</math> be a continuous random variables with a [[joint probability density function]] <math>f(x,y)</math>. The differential conditional entropy <math>h(X|Y)</math> is defined as<ref name=cover1991 />{{rp|249}}

−

The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called conditional differential (or continuous) entropy. Let <math>X</math> and <math>Y</math> be a continuous random variables with a joint probability density function <math>f(x,y)</math>. The differential conditional entropy <math>h(X|Y)</math> is defined as

−

上述定义适用于离散型随机变量。离散条件熵的连续形式称为条件微分(或连续)熵。让数学 x / math 和数学 y / math 是一个连续的随机变量和一个概率密度函数 / 数学 f (x，y) / math。微分 / 条件熵数学 h (x | y) / math 定义为

−

{{Equation box 1

−

~~{{Equation box 1~~

−

~~{方程式方框1~~

−

~~|indent =~~

−

~~|indent =~~

−

~~不会有事的~~

−

~~|title=~~

−

~~|title=~~

−

标题

−

~~|equation = {{NumBlk||<math>h(X|Y) = -\int_{\mathcal X, \mathcal Y} f(x,y)\log f(x|y)\,dx dy</math>|{{EquationRef|Eq.2}}}}~~

−

~~|equation = }}~~

−

~~会公式开始~~

−

~~|cellpadding= 6~~

−

~~|cellpadding= 6~~

−

~~6号手术室~~

−

~~|border~~

−

~~|border~~

−

边界

−

~~|border colour = #0073CF~~

−

~~|border colour = #0073CF~~

−

~~0073CF~~

−

~~|background colour=#F5FFFA}}~~

−

~~|background colour=#F5FFFA}}~~

−

~~5 / fffa }~~

−

~~=== Properties ===~~

−

~~=== Properties ===~~

−

属性

−

~~In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative.~~

−

~~In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative.~~

−

~~与离散随机变量的条件熵相反，条件微分熵可能是负的。~~

−

~~As in the discrete case there is a chain rule for differential entropy:~~

−

~~As in the discrete case there is a chain rule for differential entropy:~~

−

~~在离散情况下，微分熵有一个链式规则:~~

−

~~:<math>h(Y|X)\,=\,h(X,Y)-h(X)</math><ref name=cover1991 />{{rp|253}}~~

−

~~<math>h(Y|X)\,=\,h(X,Y)-h(X)</math>~~

−

~~数学 h (y | x) ，h (x，y)-h (x) / math~~

−

~~Notice however that this rule may not be true if the involved differential entropies do not exist or are infinite.~~

−

~~Notice however that this rule may not be true if the involved differential entropies do not exist or are infinite.~~

−

~~然而，请注意，如果所涉及的微分熵不存在或者是无限的，那么这个规则可能不成立。~~

−

~~Joint differential entropy is also used in the definition of the [[mutual information]] between continuous random variables:~~

−

~~Joint differential entropy is also used in the definition of the mutual information between continuous random variables:~~

−

~~联合微分熵也用于连续随机变量之间互信息的定义:~~

−

~~:<math>\operatorname{I}(X,Y)=h(X)-h(X|Y)=h(Y)-h(Y|X)</math>~~

−

~~<math>\operatorname{I}(X,Y)=h(X)-h(X|Y)=h(Y)-h(Y|X)</math>~~

−

~~{ i }(x，y) h (x)-h (x | y) h (y)-h (y | x) / math~~

−

~~<math>h(X|Y) \le h(X)</math> with equality if and only if <math>X</math> and <math>Y</math> are independent.<ref name=cover1991 />{{rp|253}}~~

−

~~<math>h(X|Y) \le h(X)</math> with equality if and only if <math>X</math> and <math>Y</math> are independent.~~

−

~~数学 h (x | y) le h (x) / math with equality 当且仅当数学 x / math 和数学 y / math 是独立的。~~

−

~~===Relation to estimator error===~~

−

~~===Relation to estimator error===~~

−

~~与估计误差的关系~~

−

The conditional differential entropy yields a lower bound on the expected squared error of an [[estimator]]. For any random variable <math>X</math>, observation <math>Y</math> and estimator <math>\widehat{X}</math> the following holds:<ref name=cover1991 />{{rp|255}}

−

The conditional differential entropy yields a lower bound on the expected squared error of an estimator. For any random variable <math>X</math>, observation <math>Y</math> and estimator <math>\widehat{X}</math> the following holds:

−

~~条件微分熵对估计量的期望平方误差产生一个下限。对于任何随机变量的数学 x / math，观察数学 y / math 和估计数学 x / math，下面的观点成立:~~

−

~~:<math display="block">\mathbb{E}\left[\bigl(X - \widehat{X}{(Y)}\bigr)^2\right]~~

−

~~<math display="block">\mathbb{E}\left[\bigl(X - \widehat{X}{(Y)}\bigr)^2\right]~~

−

~~数学显示块“左”[ bigl (x-widehat {(y)} bigr) ^ 2]~~

−

~~\ge \frac{1}{2\pi e}e^{2h(X|Y)}</math>~~

−

~~\ge \frac{1}{2\pi e}e^{2h(X|Y)}</math>~~

−

~~(x | y)} / math~~

−

~~This is related to the [[uncertainty principle]] from [[quantum mechanics]].~~

−

~~This is related to the uncertainty principle from quantum mechanics.~~

−

~~这与量子力学的不确定性原理有关。~~

−

~~==Generalization to quantum theory==~~

−

~~==Generalization to quantum theory==~~

−

~~对量子理论的推广~~

−

~~In [[quantum information theory]], the conditional entropy is generalized to the [[conditional quantum entropy]]. The latter can take negative values, unlike its classical counterpart.~~

−

~~In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy. The latter can take negative values, unlike its classical counterpart.~~

−

~~在量子信息论中，条件熵被推广为条件量子熵。后者可以采取负值，不像它的古典对应物。~~

−

~~== See also ==~~

−

~~== See also ==~~

−

参见

−

* [[Entropy (information theory)]]

−

* [[Mutual information]]

−

* [[Conditional quantum entropy]]

−

* [[Variation of information]]

−

* [[Entropy power inequality]]

−

* [[Likelihood function]]

−

~~==References==~~

−

~~==References==~~

−

~~参考资料~~

−

~~{{Reflist}}~~

−

~~[[Category:Entropy and information]]~~

Category:Entropy and information

第831行：第481行：

类别: 熵和信息

−

~~[[Category:Information theory]]~~

+

|indent =

Category:Information theory

Moonscar

管理员

1,592

个编辑

更改

条件熵 (查看源代码)

2020年10月25日 (日) 16:07的版本

导航菜单

搜索