第45行: |
第45行: |
| | | |
| 对该定义的直观解释是:根据定义<math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math>,其中<math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) </math>. <math>\displaystyle f</math>将给定<math>\displaystyle (X=x)</math>的<math>\displaystyle ( Y=y)</math>的信息内容与<math>\displaystyle ( x,y)</math>相关联,这是描述在给定<math>(X=x)</math>条件下的事件<math>\displaystyle (Y=y)</math>所需的信息量。根据大数定律,<math>H(Y ǀ X)</math>是<math>\displaystyle f(X,Y)</math>的大量独立实现的算术平均值。 | | 对该定义的直观解释是:根据定义<math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math>,其中<math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) </math>. <math>\displaystyle f</math>将给定<math>\displaystyle (X=x)</math>的<math>\displaystyle ( Y=y)</math>的信息内容与<math>\displaystyle ( x,y)</math>相关联,这是描述在给定<math>(X=x)</math>条件下的事件<math>\displaystyle (Y=y)</math>所需的信息量。根据大数定律,<math>H(Y ǀ X)</math>是<math>\displaystyle f(X,Y)</math>的大量独立实现的算术平均值。 |
| + | |
| + | |
| | | |
| == Motivation 动机 == | | == Motivation 动机 == |
第59行: |
第61行: |
| | | |
| 这里当取值为<math>y_i</math>时,<math>\operatorname{I}(y_i)</math>是其结果<math>Y</math>的信息内容。类似地以<math>X</math>为条件的<math>Y</math>的熵,当值为<math>x</math>时,也可以通过条件期望来定义: | | 这里当取值为<math>y_i</math>时,<math>\operatorname{I}(y_i)</math>是其结果<math>Y</math>的信息内容。类似地以<math>X</math>为条件的<math>Y</math>的熵,当值为<math>x</math>时,也可以通过条件期望来定义: |
− |
| |
| | | |
| | | |
第89行: |
第90行: |
| | | |
| <!-- This paragraph is incorrect; the last line is not the KL divergence between any two distributions, since p(x) is [in general] not a valid distribution over the domains of X and Y. The last formula above is the [[Kullback-Leibler divergence]], also known as relative entropy. Relative entropy is always positive, and vanishes if and only if <math>p(x,y) = p(x)</math>. This is when knowing <math>x</math> tells us everything about <math>y</math>. ADDED: Could this comment be out of date since the KL divergence is not mentioned above? November 2014 --> | | <!-- This paragraph is incorrect; the last line is not the KL divergence between any two distributions, since p(x) is [in general] not a valid distribution over the domains of X and Y. The last formula above is the [[Kullback-Leibler divergence]], also known as relative entropy. Relative entropy is always positive, and vanishes if and only if <math>p(x,y) = p(x)</math>. This is when knowing <math>x</math> tells us everything about <math>y</math>. ADDED: Could this comment be out of date since the KL divergence is not mentioned above? November 2014 --> |
| + | |
| + | |
| | | |
| == Properties 属性 == | | == Properties 属性 == |
第102行: |
第105行: |
| | | |
| 相反,当且仅当<math>Y</math>和<math>X</math>是独立随机变量时,则为<math>H(Y|X) =H(Y)</math>。 | | 相反,当且仅当<math>Y</math>和<math>X</math>是独立随机变量时,则为<math>H(Y|X) =H(Y)</math>。 |
| + | |
| + | |
| | | |
| === Chain rule 链式法则 === | | === Chain rule 链式法则 === |
第138行: |
第143行: |
| | | |
| 除了使用加法而不是乘法之外,它具有与概率论中的链式法则类似的形式。 | | 除了使用加法而不是乘法之外,它具有与概率论中的链式法则类似的形式。 |
| + | |
| + | |
| | | |
| === Bayes' rule 贝叶斯法则 === | | === Bayes' rule 贝叶斯法则 === |
第148行: |
第155行: |
| | | |
| :<math>H(Y|X,Z) \,=\, H(Y|X).</math> | | :<math>H(Y|X,Z) \,=\, H(Y|X).</math> |
| + | |
| + | |
| | | |
| === Other properties 其他性质 === | | === Other properties 其他性质 === |