第173行: |
第173行: |
| | | |
| | | |
− | where <math>Ha(X)</math> and <math>H(Y)</math> are the marginal [[information entropy|entropies]], <math>H(X|Y)</math> and <math>H(Y|X)</math> are the [[conditional entropy|conditional entropies]], and <math>H(X,Y)</math> is the [[joint entropy]] of <math>X</math> and <math>Y</math>. | + | where <math>H(X)</math> and <math>H(Y)</math> are the marginal [[information entropy|entropies]], <math>H(X|Y)</math> and <math>H(Y|X)</math> are the [[conditional entropy|conditional entropies]], and <math>H(X,Y)</math> is the [[joint entropy]] of <math>X</math> and <math>Y</math>. |
| | | |
− | where <math>H(X)</math> and <math>H(Y)</math> are the marginal entropies, <math>H(X|Y)</math> and <math>H(Y|X)</math> are the conditional entropies, and <math>H(X,Y)</math> is the joint entropy of <math>X</math> and <math>Y</math>.
| |
| | | |
− | 其中H(X)和H(Y)是边际熵,H(X | Y)和H(Y | X)是条件熵,H(X,Y)是<math>X</math>和<math>Y</math>的联合熵。
| + | 其中<math>H(X)</math>和<math>H(Y)</math>是'''边际熵 Marginal entropy''',<math>H(X|Y)</math>和<math>H(Y|X)</math>表示'''条件熵 Conditional entropy''',<math>H(X,Y)</math>是<math>X</math>和<math>Y</math>的'''联合熵 Joint entropy'''。 |
| | | |
| | | |
| | | |
| | | |
− |
| |
− | Notice the analogy to the union, difference, and intersection of two sets: in this respect, all the formulas given above are apparent from the Venn diagram reported at the beginning of the article.
| |
| | | |
| Notice the analogy to the union, difference, and intersection of two sets: in this respect, all the formulas given above are apparent from the Venn diagram reported at the beginning of the article. | | Notice the analogy to the union, difference, and intersection of two sets: in this respect, all the formulas given above are apparent from the Venn diagram reported at the beginning of the article. |
第194行: |
第191行: |
| In terms of a communication channel in which the output <math>Y</math> is a noisy version of the input <math>X</math>, these relations are summarised in the figure: | | In terms of a communication channel in which the output <math>Y</math> is a noisy version of the input <math>X</math>, these relations are summarised in the figure: |
| | | |
− | In terms of a communication channel in which the output <math>Y</math> is a noisy version of the input <math>X</math>, these relations are summarised in the figure:
| |
| | | |
| 就输出<math>Y</math>是输入<math>X</math>的噪声版本的通信信道而言,这些关系如图中总结所示: | | 就输出<math>Y</math>是输入<math>X</math>的噪声版本的通信信道而言,这些关系如图中总结所示: |
第208行: |
第204行: |
| Because <math>\operatorname{I}(X;Y)</math> is non-negative, consequently, <math>H(X) \ge H(X|Y)</math>. Here we give the detailed deduction of <math>\operatorname{I}(X;Y)=H(Y)-H(Y|X)</math> for the case of jointly discrete random variables: | | Because <math>\operatorname{I}(X;Y)</math> is non-negative, consequently, <math>H(X) \ge H(X|Y)</math>. Here we give the detailed deduction of <math>\operatorname{I}(X;Y)=H(Y)-H(Y|X)</math> for the case of jointly discrete random variables: |
| | | |
− | Because <math>\operatorname{I}(X;Y)</math> is non-negative, consequently, <math>\Eta(X) \ge \Eta(X|Y)</math>. Here we give the detailed deduction of <math>\operatorname{I}(X;Y)=\Eta(Y)-\Eta(Y|X)</math> for the case of jointly discrete random variables:
| |
| | | |
| 因为I(X;Y)是非负的,因此H(X)>=H(X|Y)。这里我们给出了联合离散随机变量情形下结论I(X;Y)=H(Y)-H(Y | X)的详细推导过程: | | 因为I(X;Y)是非负的,因此H(X)>=H(X|Y)。这里我们给出了联合离散随机变量情形下结论I(X;Y)=H(Y)-H(Y | X)的详细推导过程: |
| + | |
| | | |
| | | |
第216行: |
第212行: |
| | | |
| | | |
− |
| |
− | The proofs of the other identities above are similar. The proof of the general case (not just discrete) is similar, with integrals replacing sums.
| |
| | | |
| The proofs of the other identities above are similar. The proof of the general case (not just discrete) is similar, with integrals replacing sums. | | The proofs of the other identities above are similar. The proof of the general case (not just discrete) is similar, with integrals replacing sums. |
第229行: |
第223行: |
| Intuitively, if entropy <math>H(Y)</math> is regarded as a measure of uncertainty about a random variable, then <math>Ha(Y|X)</math> is a measure of what <math>X</math> does ''not'' say about <math>Y</math>. This is "the amount of uncertainty remaining about <math>Y</math> after <math>X</math> is known", and thus the right side of the second of these equalities can be read as "the amount of uncertainty in <math>Y</math>, minus the amount of uncertainty in <math>Y</math> which remains after <math>X</math> is known", which is equivalent to "the amount of uncertainty in <math>Y</math> which is removed by knowing <math>X</math>". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other. | | Intuitively, if entropy <math>H(Y)</math> is regarded as a measure of uncertainty about a random variable, then <math>Ha(Y|X)</math> is a measure of what <math>X</math> does ''not'' say about <math>Y</math>. This is "the amount of uncertainty remaining about <math>Y</math> after <math>X</math> is known", and thus the right side of the second of these equalities can be read as "the amount of uncertainty in <math>Y</math>, minus the amount of uncertainty in <math>Y</math> which remains after <math>X</math> is known", which is equivalent to "the amount of uncertainty in <math>Y</math> which is removed by knowing <math>X</math>". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other. |
| | | |
− | Intuitively, if entropy <math>\Eta(Y)</math> is regarded as a measure of uncertainty about a random variable, then <math>\Eta(Y|X)</math> is a measure of what <math>X</math> does not say about <math>Y</math>. This is "the amount of uncertainty remaining about <math>Y</math> after <math>X</math> is known", and thus the right side of the second of these equalities can be read as "the amount of uncertainty in <math>Y</math>, minus the amount of uncertainty in <math>Y</math> which remains after <math>X</math> is known", which is equivalent to "the amount of uncertainty in <math>Y</math> which is removed by knowing <math>X</math>". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other.
| |
| | | |
| 直观地说,如果熵H (Y)被看作是对一个随机变量的不确定性的度量,那么<math>H(Y|X)</math>是对 <math>X</math>没有对<math>Y</math>进行说明的度量。这是“<math>X</math>已知后<math>Y</math>剩余的不确定性量” ,因此,第二个等式的右边可以被解读为“数学 y / 数学中的不确定性量,减去数学 y / 数学中的不确定性量,在数学 x / 数学已知后仍然存在的不确定性量” ,这相当于“数学 y / 数学中的不确定性量,通过知道数学 x / 数学而去除”。这证实了互信息的直观含义,即知道任何一个变量提供的关于另一个变量的信息量(即不确定性的减少)。 | | 直观地说,如果熵H (Y)被看作是对一个随机变量的不确定性的度量,那么<math>H(Y|X)</math>是对 <math>X</math>没有对<math>Y</math>进行说明的度量。这是“<math>X</math>已知后<math>Y</math>剩余的不确定性量” ,因此,第二个等式的右边可以被解读为“数学 y / 数学中的不确定性量,减去数学 y / 数学中的不确定性量,在数学 x / 数学已知后仍然存在的不确定性量” ,这相当于“数学 y / 数学中的不确定性量,通过知道数学 x / 数学而去除”。这证实了互信息的直观含义,即知道任何一个变量提供的关于另一个变量的信息量(即不确定性的减少)。 |
第239行: |
第232行: |
| Note that in the discrete case <math>H(X|X) = 0</math> and therefore <math>H(X) = \operatorname{I}(X;X)</math>. Thus <math>\operatorname{I}(X; X) \ge \operatorname{I}(X; Y)</math>, and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide. | | Note that in the discrete case <math>H(X|X) = 0</math> and therefore <math>H(X) = \operatorname{I}(X;X)</math>. Thus <math>\operatorname{I}(X; X) \ge \operatorname{I}(X; Y)</math>, and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide. |
| | | |
− | Note that in the discrete case <math>H(X|X) = 0</math> and therefore <math>Ha(X) = \operatorname{I}(X;X)</math>. Thus <math>\operatorname{I}(X; X) \ge \operatorname{I}(X; Y)</math>, and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide.
| |
| | | |
| 注意,在离散情况下 math Eta (x | x)0 / math,因此 math Eta (x) operatorname { i }(x; x) / math。因此 math operatorname { i }(x; x) ge operatorname { i }(x; y) / math,我们可以公式化这样一个基本原则,即一个变量包含的关于它自身的信息至少与任何其他变量所能提供的信息一样多。 | | 注意,在离散情况下 math Eta (x | x)0 / math,因此 math Eta (x) operatorname { i }(x; x) / math。因此 math operatorname { i }(x; x) ge operatorname { i }(x; y) / math,我们可以公式化这样一个基本原则,即一个变量包含的关于它自身的信息至少与任何其他变量所能提供的信息一样多。 |