第173行: |
第173行: |
| | | |
| | | |
− | where <math>\Eta(X)</math> and <math>\Eta(Y)</math> are the marginal [[information entropy|entropies]], <math>\Eta(X|Y)</math> and <math>\Eta(Y|X)</math> are the [[conditional entropy|conditional entropies]], and <math>\Eta(X,Y)</math> is the [[joint entropy]] of <math>X</math> and <math>Y</math>. | + | where <math>Ha(X)</math> and <math>H(Y)</math> are the marginal [[information entropy|entropies]], <math>H(X|Y)</math> and <math>H(Y|X)</math> are the [[conditional entropy|conditional entropies]], and <math>H(X,Y)</math> is the [[joint entropy]] of <math>X</math> and <math>Y</math>. |
| | | |
− | where <math>\Eta(X)</math> and <math>\Eta(Y)</math> are the marginal entropies, <math>\Eta(X|Y)</math> and <math>\Eta(Y|X)</math> are the conditional entropies, and <math>\Eta(X,Y)</math> is the joint entropy of <math>X</math> and <math>Y</math>. | + | where <math>H(X)</math> and <math>H(Y)</math> are the marginal entropies, <math>H(X|Y)</math> and <math>H(Y|X)</math> are the conditional entropies, and <math>H(X,Y)</math> is the joint entropy of <math>X</math> and <math>Y</math>. |
| | | |
| 其中H(X)和H(Y)是边际熵,H(X | Y)和H(Y | X)是条件熵,H(X,Y)是<math>X</math>和<math>Y</math>的联合熵。 | | 其中H(X)和H(Y)是边际熵,H(X | Y)和H(Y | X)是条件熵,H(X,Y)是<math>X</math>和<math>Y</math>的联合熵。 |
第206行: |
第206行: |
| | | |
| | | |
− | Because <math>\operatorname{I}(X;Y)</math> is non-negative, consequently, <math>\Eta(X) \ge \Eta(X|Y)</math>. Here we give the detailed deduction of <math>\operatorname{I}(X;Y)=\Eta(Y)-\Eta(Y|X)</math> for the case of jointly discrete random variables: | + | Because <math>\operatorname{I}(X;Y)</math> is non-negative, consequently, <math>H(X) \ge H(X|Y)</math>. Here we give the detailed deduction of <math>\operatorname{I}(X;Y)=H(Y)-H(Y|X)</math> for the case of jointly discrete random variables: |
| | | |
| Because <math>\operatorname{I}(X;Y)</math> is non-negative, consequently, <math>\Eta(X) \ge \Eta(X|Y)</math>. Here we give the detailed deduction of <math>\operatorname{I}(X;Y)=\Eta(Y)-\Eta(Y|X)</math> for the case of jointly discrete random variables: | | Because <math>\operatorname{I}(X;Y)</math> is non-negative, consequently, <math>\Eta(X) \ge \Eta(X|Y)</math>. Here we give the detailed deduction of <math>\operatorname{I}(X;Y)=\Eta(Y)-\Eta(Y|X)</math> for the case of jointly discrete random variables: |
第221行: |
第221行: |
| The proofs of the other identities above are similar. The proof of the general case (not just discrete) is similar, with integrals replacing sums. | | The proofs of the other identities above are similar. The proof of the general case (not just discrete) is similar, with integrals replacing sums. |
| | | |
− | 同理,上述其他恒等式的证明方法都是是相似的。一般情况(不仅仅是离散情况)的证明是类似的,用积分代替求。 | + | 同理,上述其他恒等式的证明方法都是是相似的。一般情况(不仅仅是离散情况)的证明是类似的,用积分代替求和。 |
| | | |
| | | |
第227行: |
第227行: |
| | | |
| | | |
− | Intuitively, if entropy <math>\Eta(Y)</math> is regarded as a measure of uncertainty about a random variable, then <math>\Eta(Y|X)</math> is a measure of what <math>X</math> does ''not'' say about <math>Y</math>. This is "the amount of uncertainty remaining about <math>Y</math> after <math>X</math> is known", and thus the right side of the second of these equalities can be read as "the amount of uncertainty in <math>Y</math>, minus the amount of uncertainty in <math>Y</math> which remains after <math>X</math> is known", which is equivalent to "the amount of uncertainty in <math>Y</math> which is removed by knowing <math>X</math>". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other. | + | Intuitively, if entropy <math>H(Y)</math> is regarded as a measure of uncertainty about a random variable, then <math>Ha(Y|X)</math> is a measure of what <math>X</math> does ''not'' say about <math>Y</math>. This is "the amount of uncertainty remaining about <math>Y</math> after <math>X</math> is known", and thus the right side of the second of these equalities can be read as "the amount of uncertainty in <math>Y</math>, minus the amount of uncertainty in <math>Y</math> which remains after <math>X</math> is known", which is equivalent to "the amount of uncertainty in <math>Y</math> which is removed by knowing <math>X</math>". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other. |
| | | |
| Intuitively, if entropy <math>\Eta(Y)</math> is regarded as a measure of uncertainty about a random variable, then <math>\Eta(Y|X)</math> is a measure of what <math>X</math> does not say about <math>Y</math>. This is "the amount of uncertainty remaining about <math>Y</math> after <math>X</math> is known", and thus the right side of the second of these equalities can be read as "the amount of uncertainty in <math>Y</math>, minus the amount of uncertainty in <math>Y</math> which remains after <math>X</math> is known", which is equivalent to "the amount of uncertainty in <math>Y</math> which is removed by knowing <math>X</math>". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other. | | Intuitively, if entropy <math>\Eta(Y)</math> is regarded as a measure of uncertainty about a random variable, then <math>\Eta(Y|X)</math> is a measure of what <math>X</math> does not say about <math>Y</math>. This is "the amount of uncertainty remaining about <math>Y</math> after <math>X</math> is known", and thus the right side of the second of these equalities can be read as "the amount of uncertainty in <math>Y</math>, minus the amount of uncertainty in <math>Y</math> which remains after <math>X</math> is known", which is equivalent to "the amount of uncertainty in <math>Y</math> which is removed by knowing <math>X</math>". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other. |
第237行: |
第237行: |
| | | |
| | | |
− | Note that in the discrete case <math>\Eta(X|X) = 0</math> and therefore <math>\Eta(X) = \operatorname{I}(X;X)</math>. Thus <math>\operatorname{I}(X; X) \ge \operatorname{I}(X; Y)</math>, and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide. | + | Note that in the discrete case <math>H(X|X) = 0</math> and therefore <math>H(X) = \operatorname{I}(X;X)</math>. Thus <math>\operatorname{I}(X; X) \ge \operatorname{I}(X; Y)</math>, and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide. |
| | | |
− | Note that in the discrete case <math>\Eta(X|X) = 0</math> and therefore <math>\Eta(X) = \operatorname{I}(X;X)</math>. Thus <math>\operatorname{I}(X; X) \ge \operatorname{I}(X; Y)</math>, and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide. | + | Note that in the discrete case <math>H(X|X) = 0</math> and therefore <math>Ha(X) = \operatorname{I}(X;X)</math>. Thus <math>\operatorname{I}(X; X) \ge \operatorname{I}(X; Y)</math>, and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide. |
| | | |
| 注意,在离散情况下 math Eta (x | x)0 / math,因此 math Eta (x) operatorname { i }(x; x) / math。因此 math operatorname { i }(x; x) ge operatorname { i }(x; y) / math,我们可以公式化这样一个基本原则,即一个变量包含的关于它自身的信息至少与任何其他变量所能提供的信息一样多。 | | 注意,在离散情况下 math Eta (x | x)0 / math,因此 math Eta (x) operatorname { i }(x; x) / math。因此 math operatorname { i }(x; x) ge operatorname { i }(x; y) / math,我们可以公式化这样一个基本原则,即一个变量包含的关于它自身的信息至少与任何其他变量所能提供的信息一样多。 |