第1行: |
第1行: |
| 此词条由Jie翻译。由Lincent审校。 | | 此词条由Jie翻译。由Lincent审校。 |
− |
| |
− | {{Short description|Measure of relative information in probability theory}}
| |
− | {{Information theory}}
| |
| | | |
| [[文件:Entropy-mutual-information-relative-entropy-relation-diagram.svg|缩略图|右|该图表示在变量X、Y相关联的各种信息量之间,进行加减关系的维恩图。两个圆所包含的区域是联合熵H(X,Y)。左侧的圆圈(红色和紫色)是单个熵H(X),红色是条件熵H(X ǀ Y)。右侧的圆圈(蓝色和紫色)为H(Y),蓝色为H(Y ǀ X)。中间紫色的是相互信息i(X; Y)。]] | | [[文件:Entropy-mutual-information-relative-entropy-relation-diagram.svg|缩略图|右|该图表示在变量X、Y相关联的各种信息量之间,进行加减关系的维恩图。两个圆所包含的区域是联合熵H(X,Y)。左侧的圆圈(红色和紫色)是单个熵H(X),红色是条件熵H(X ǀ Y)。右侧的圆圈(蓝色和紫色)为H(Y),蓝色为H(Y ǀ X)。中间紫色的是相互信息i(X; Y)。]] |
| | | |
− | In [[information theory]], the '''conditional entropy''' quantifies the amount of information needed to describe the outcome of a [[random variable]] <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in [[Shannon (unit)|shannon]]s, [[Nat (unit)|nat]]s, or [[Hartley (unit)|hartley]]s. The ''entropy of <math>Y</math> conditioned on <math>X</math>'' is written as H(X ǀ Y).
| |
| | | |
− | 在'''<font color="#ff8000">信息论 Information theory</font>'''中,假设随机变量<math>X</math>的值已知,那么'''<font color="#ff8000">条件熵 Conditional entropy</font>'''则用于定量描述随机变量<math>Y</math>表示的信息量。此时,信息以'''<font color="#ff8000">香农 Shannon </font>''','''<font color="#ff8000">奈特 nat</font>'''或'''<font color="#ff8000">哈特莱 hartley</font>'''来衡量。已知<math>X</math>的条件下<math>Y</math>的熵记为<math>H(X ǀ Y)</math>。 | + | 在[[信息论 Information theory]]中,假设随机变量<math>X</math>的值已知,那么'''条件熵 Conditional entropy'''则用于定量描述随机变量<math>Y</math>表示的信息量。此时,信息以香农 Shannon''','''<font color="#ff8000">奈特 nat</font>'''或'''<font color="#ff8000">哈特莱 hartley</font>'''来衡量。已知<math>X</math>的条件下<math>Y</math>的熵记为<math>H(X ǀ Y)</math>。 |
| | | |
| | | |
| | | |
− | == Definition 定义 == | + | == 定义 == |
− | The conditional entropy of <math>Y</math> given <math>X</math> is defined as
| |
| | | |
| 在给定<math>X</math>的情况下,<math>Y</math>的条件熵定义为: | | 在给定<math>X</math>的情况下,<math>Y</math>的条件熵定义为: |
− |
| |
| | | |
| | | |
第28行: |
第22行: |
| |background colour=#F5FFFA}} | | |background colour=#F5FFFA}} |
| | | |
− |
| |
− |
| |
− | where <math>\mathcal X</math> and <math>\mathcal Y</math> denote the [[Support (mathematics)|support sets]] of <math>X</math> and <math>Y</math>.
| |
| | | |
| 其中<math>\mathcal X</math>和<math>\mathcal Y</math>表示<math>X</math>和<math>Y</math>的'''<font color="#ff8000">支撑集 support sets</font>'''。 | | 其中<math>\mathcal X</math>和<math>\mathcal Y</math>表示<math>X</math>和<math>Y</math>的'''<font color="#ff8000">支撑集 support sets</font>'''。 |
| | | |
| | | |
| + | '''注意''':约定<math>c > 0</math>始终成立时,表达式<math>0 \log 0</math>和<math>0 \log c/0</math>视为等于零。这是因为<math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math>,而且<math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math>><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref> <!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? --> |
| | | |
− | ''Note:'' It is conventioned that the expressions <math>0 \log 0</math> and <math>0 \log c/0</math> for fixed <math>c > 0</math> should be treated as being equal to zero. This is because <math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math> and <math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref> <!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? -->
| |
− |
| |
− | 注意:约定<math>c > 0</math>始终成立时,表达式<math>0 \log 0</math>和<math>0 \log c/0</math>视为等于零。这是因为<math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math>,而且<math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math>><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref> <!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? -->
| |
− |
| |
− |
| |
− |
| |
− | Intuitive explanation of the definition : According to the definition, <math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math> where <math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) .</math> <math>\displaystyle f</math> associates to <math>\displaystyle ( x,y)</math> the information content of <math>\displaystyle ( Y=y)</math> given <math>\displaystyle (X=x)</math>, which is the amount of information needed to describe the event <math>\displaystyle (Y=y)</math> given <math>(X=x)</math>. According to the law of large numbers, <math>\displaystyle H(Y|X)</math> is the arithmetic mean of a large number of independent realizations of <math>\displaystyle f(X,Y)</math>.
| |
| | | |
| 对该定义的直观解释是:根据定义<math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math>,其中<math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) </math>. <math>\displaystyle f</math>将给定<math>\displaystyle (X=x)</math>时的<math>\displaystyle ( Y=y)</math>的信息内容与<math>\displaystyle ( x,y)</math>相关联,这是描述在给定<math>(X=x)</math>条件下的事件<math>\displaystyle (Y=y)</math>所需的信息量。根据大数定律,<math>H(Y ǀ X)</math>是大量<math>\displaystyle f(X,Y)</math>独立实验结果的算术平均值。 | | 对该定义的直观解释是:根据定义<math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math>,其中<math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) </math>. <math>\displaystyle f</math>将给定<math>\displaystyle (X=x)</math>时的<math>\displaystyle ( Y=y)</math>的信息内容与<math>\displaystyle ( x,y)</math>相关联,这是描述在给定<math>(X=x)</math>条件下的事件<math>\displaystyle (Y=y)</math>所需的信息量。根据大数定律,<math>H(Y ǀ X)</math>是大量<math>\displaystyle f(X,Y)</math>独立实验结果的算术平均值。 |
第48行: |
第33行: |
| | | |
| | | |
− | == Motivation 动机 == | + | == 动机 == |
− | Let <math>H(Y ǀ X = x)</math> be the [[Shannon Entropy|entropy]] of the discrete random variable <math>Y</math> conditioned on the discrete random variable <math>X</math> taking a certain value <math>x</math>. Denote the support sets of <math>X</math> and <math>Y</math> by <math>\mathcal X</math> and <math>\mathcal Y</math>. Let <math>Y</math> have [[probability mass function]] <math>p_Y{(y)}</math>. The unconditional entropy of <math>Y</math> is calculated as <math>H(Y):=E[I(Y)</math>, i.e.
| |
| | | |
| 设<math>H(Y ǀ X = x)</math>为离散随机变量<math>Y</math>在离散随机变量<math>X</math>取定值<math>x</math>时的熵。用<math>\mathcal X</math>和<math>\mathcal Y</math>表示<math>X</math>和<math>Y</math>的支撑集。令<math>Y</math>的概率密度函数为<math>p_Y{(y)}</math>。<math>Y</math>的无条件熵计算为<math>H(Y):=E[I(Y)</math>。 | | 设<math>H(Y ǀ X = x)</math>为离散随机变量<math>Y</math>在离散随机变量<math>X</math>取定值<math>x</math>时的熵。用<math>\mathcal X</math>和<math>\mathcal Y</math>表示<math>X</math>和<math>Y</math>的支撑集。令<math>Y</math>的概率密度函数为<math>p_Y{(y)}</math>。<math>Y</math>的无条件熵计算为<math>H(Y):=E[I(Y)</math>。 |
第57行: |
第41行: |
| = -\sum_{y\in\mathcal Y} {p_Y(y) \log_2{p_Y(y)}},</math> | | = -\sum_{y\in\mathcal Y} {p_Y(y) \log_2{p_Y(y)}},</math> |
| | | |
− |
| |
− | where <math>\operatorname{I}(y_i)</math> is the [[information content]] of the [[Outcome (probability)|outcome]] of <math>Y</math> taking the value <math>y_i</math>. The entropy of <math>Y</math> conditioned on <math>X</math> taking the value <math>x</math> is defined analogously by [[conditional expectation]]:
| |
| | | |
| 当<math>Y</math>取值为<math>y_i</math>时,<math>\operatorname{I}(y_i)</math>是其结果<math>Y</math>的信息内容。类似地,当<math>X</math>值为<math>x</math>时以<math>X</math>为条件的<math>Y</math>的熵,也可以通过条件期望来定义: | | 当<math>Y</math>取值为<math>y_i</math>时,<math>\operatorname{I}(y_i)</math>是其结果<math>Y</math>的信息内容。类似地,当<math>X</math>值为<math>x</math>时以<math>X</math>为条件的<math>Y</math>的熵,也可以通过条件期望来定义: |
第67行: |
第49行: |
| | | |
| | | |
− | Note that<math> H(Y ǀ X)</math> is the result of averaging <math>H(Y ǀ X = x)</math> over all possible values <math>x</math> that <math>X</math> may take. Also, if the above sum is taken over a sample <math>y_1, \dots, y_n</math>, the expected value <math>E_X[ H(y_1, \dots, y_n \mid X = x)]</math> is known in some domains as '''equivocation'''.<ref>{{cite journal|author1=Hellman, M.|author2=Raviv, J.|year=1970|title=Probability of error, equivocation, and the Chernoff bound|journal=IEEE Transactions on Information Theory|volume=16|issue=4|pp=368-372}}</ref>
| + | '''注意''',<math> H(Y ǀ X)</math>是在<math>X</math>可能取的所有可能值<math>x</math>时对<math>H(Y ǀ X = x)</math>求平均值的结果。同样,如果上述和取自样本<math>y_1, \dots, y_n</math>上,则期望值<math>E_X[ H(y_1, \dots, y_n \mid X = x)]</math><font color="#32cd32"> 在某些领域中认为是模糊值</font>。<ref>{{cite journal|author1=Hellman, M.|author2=Raviv, J.|year=1970|title=Probability of error, equivocation, and the Chernoff bound|journal=IEEE Transactions on Information Theory|volume=16|issue=4|pp=368-372}}</ref> |
| | | |
− | 注意,<math> H(Y ǀ X)</math>是在<math>X</math>可能取的所有可能值<math>x</math>时对<math>H(Y ǀ X = x)</math>求平均值的结果。同样,如果上述和取自样本<math>y_1, \dots, y_n</math>上,则期望值<math>E_X[ H(y_1, \dots, y_n \mid X = x)]</math><font color="#32cd32"> 在某些领域中认为是模糊值</font>。<ref>{{cite journal|author1=Hellman, M.|author2=Raviv, J.|year=1970|title=Probability of error, equivocation, and the Chernoff bound|journal=IEEE Transactions on Information Theory|volume=16|issue=4|pp=368-372}}</ref>
| |
− |
| |
− |
| |
− |
| |
− | Given [[Discrete random variable|discrete random variables]] <math>X</math> with image <math>\mathcal X</math> and <math>Y</math> with image <math>\mathcal Y</math>, the conditional entropy of <math>Y</math> given <math>X</math> is defined as the weighted sum of <math>H(Y|X=x)</math> for each possible value of <math>x</math>, using <math>p(x)</math> as the weights:<ref name=cover1991>{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|15}}
| |
| | | |
| 给定具有像<math>\mathcal X</math>的离散随机变量<math>X</math>和具有像<math>\mathcal Y</math>的离散随机变量<math>Y</math>,将给定<math>X</math>的<math>Y</math>的条件熵定义为以<math>p(x)</math>作为权重,对<math>x</math>的每个可能取值得到的<math>H(Y|X=x)</math>的加权和。其表达式如下:<ref name=cover1991>{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|15}} | | 给定具有像<math>\mathcal X</math>的离散随机变量<math>X</math>和具有像<math>\mathcal Y</math>的离散随机变量<math>Y</math>,将给定<math>X</math>的<math>Y</math>的条件熵定义为以<math>p(x)</math>作为权重,对<math>x</math>的每个可能取值得到的<math>H(Y|X=x)</math>的加权和。其表达式如下:<ref name=cover1991>{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|15}} |
第89行: |
第66行: |
| </math> | | </math> |
| | | |
− | <!-- This paragraph is incorrect; the last line is not the KL divergence between any two distributions, since p(x) is [in general] not a valid distribution over the domains of X and Y. The last formula above is the [[Kullback-Leibler divergence]], also known as relative entropy. Relative entropy is always positive, and vanishes if and only if <math>p(x,y) = p(x)</math>. This is when knowing <math>x</math> tells us everything about <math>y</math>. ADDED: Could this comment be out of date since the KL divergence is not mentioned above? November 2014 -->
| |
− |
| |
− |
| |
− |
| |
− | == Properties 属性 ==
| |
− | === Conditional entropy equals zero 条件熵等于零 ===
| |
− | <math>H(Y|X)=0</math> if and only if the value of <math>Y</math> is completely determined by the value of <math>X</math>.
| |
| | | |
| + | == 属性 == |
| + | === 条件熵等于零 Conditional entropy equals zero === |
| 当且仅当<math>Y</math>的值完全由<math>X</math>的值确定时,<math>H(Y|X)=0</math>。 | | 当且仅当<math>Y</math>的值完全由<math>X</math>的值确定时,<math>H(Y|X)=0</math>。 |
| | | |
| | | |
− | | + | ===独立随机变量的条件熵 Conditional entropy of independent random variables === |
− | === Conditional entropy of independent random variables 独立随机变量的条件熵 === | |
− | Conversely, <math>H(Y|X) = H(Y)</math> if and only if <math>Y</math> and <math>X</math> are [[independent random variables]].
| |
| | | |
| 相反,当且仅当<math>Y</math>和<math>X</math>是互相独立的随机变量时,则<math>H(Y|X) =H(Y)</math>。 | | 相反,当且仅当<math>Y</math>和<math>X</math>是互相独立的随机变量时,则<math>H(Y|X) =H(Y)</math>。 |
第108行: |
第78行: |
| | | |
| | | |
− | === Chain rule 链式法则 === | + | === 链式法则 Chain rule === |
− | Assume that the combined system determined by two random variables <math>X</math> and <math>Y</math> has [[joint entropy]] <math>H(X,Y)</math>, that is, we need <math>H(X,Y)</math> bits of information on average to describe its exact state. Now if we first learn the value of <math>X</math>, we have gained <math>H(X)</math> bits of information. Once <math>X</math> is known, we only need <math>H(X,Y)-H(X)</math> bits to describe the state of the whole system. This quantity is exactly <math>H(Y|X)</math>, which gives the ''chain rule'' of conditional entropy:
| + | 假设由两个随机变量<math>X</math>和<math>Y</math>确定的组合系统具有[[联合熵]]<math>H(X,Y)</math>,也就是说,我们通常需要<math>H(X,Y)</math>位信息来描述其确切状态。现在,如果我们首先尝试获得<math>X</math>的值,我们将知晓<math>H(X)</math>位信息。一旦<math>X</math>的值确定,我们就可以通过<math>H(X,Y)</math>-<math>H(X)</math>位来描述整个系统的状态。这个数量恰好是<math>H(Y|X)</math>,它给出了条件熵的链式法则: |
− | | |
− | 假设由两个随机变量<math>X</math>和<math>Y</math>确定的组合系统具有联合熵<math>H(X,Y)</math>,也就是说,我们通常需要<math>H(X,Y)</math>位信息来描述其确切状态。现在,如果我们首先尝试获得<math>X</math>的值,我们将知晓<math>H(X)</math>位信息。一旦<math>X</math>的值确定,我们就可以通过<math>H(X,Y)</math>-<math>H(X)</math>位来描述整个系统的状态。这个数量恰好是<math>H(Y|X)</math>,它给出了条件熵的链式法则:
| |
| | | |
| | | |
| :<math>H(Y|X)\, = \, H(X,Y)- H(X).</math><ref name=cover1991 />{{rp|17}} | | :<math>H(Y|X)\, = \, H(X,Y)- H(X).</math><ref name=cover1991 />{{rp|17}} |
| | | |
− |
| |
− | The chain rule follows from the above definition of conditional entropy:
| |
| | | |
| 链式法则遵循以上条件熵的定义: | | 链式法则遵循以上条件熵的定义: |
第130行: |
第96行: |
| \end{align}</math> | | \end{align}</math> |
| | | |
− |
| |
− | In general, a chain rule for multiple random variables holds:
| |
| | | |
| 通常情况下,多个随机变量的链式法则表示为: | | 通常情况下,多个随机变量的链式法则表示为: |
第139行: |
第103行: |
| \sum_{i=1}^n H(X_i | X_1, \ldots, X_{i-1}) </math><ref name=cover1991 />{{rp|22}} | | \sum_{i=1}^n H(X_i | X_1, \ldots, X_{i-1}) </math><ref name=cover1991 />{{rp|22}} |
| | | |
− |
| |
− | It has a similar form to [[Chain rule (probability)|chain rule]] in probability theory, except that addition instead of multiplication is used.
| |
| | | |
| 除了使用加法而不是乘法之外,它具有与概率论中的链式法则类似的形式。 | | 除了使用加法而不是乘法之外,它具有与概率论中的链式法则类似的形式。 |
第146行: |
第108行: |
| | | |
| | | |
− | === Bayes' rule 贝叶斯法则 === | + | ===贝叶斯法则 Bayes' rule === |
− | [[Bayes' rule]] for conditional entropy states | + | 条件熵状态的[[贝叶斯法则 Bayes' rule]] |
− | | |
− | 条件熵状态的贝叶斯法则
| |
| | | |
| | | |
第155行: |
第115行: |
| | | |
| | | |
− | ''Proof.'' <math>H(Y|X) = H(X,Y) - H(X)</math> and <math>H(X|Y) = H(Y,X) - H(Y)</math>. Symmetry entails <math>H(X,Y) = H(Y,X)</math>. Subtracting the two equations implies Bayes' rule. | + | '''证明''',<math>H(Y|X) = H(X,Y) - H(X)</math> 和 <math>H(X|Y) = H(Y,X) - H(Y)</math>。对称性要求<math>H(X,Y) = H(Y,X)</math>。将两个方程式相减就意味着贝叶斯定律。 |
− | | |
− | 证明,<math>H(Y|X) = H(X,Y) - H(X)</math> 和 <math>H(X|Y) = H(Y,X) - H(Y)</math>。对称性要求<math>H(X,Y) = H(Y,X)</math>。将两个方程式相减就意味着贝叶斯定律。
| |
− | | |
− | | |
| | | |
− | If <math>Y</math> is [[Conditional independence|conditionally independent]] of <math>Z</math> given <math>X</math> we have:
| |
| | | |
| 如果给定<math>X</math>,<math>Y</math>有条件地独立于<math>Z</math>,则有: | | 如果给定<math>X</math>,<math>Y</math>有条件地独立于<math>Z</math>,则有: |
第169行: |
第124行: |
| | | |
| | | |
− | | + | ===其他性质 === |
− | === Other properties 其他性质 === | |
− | For any <math>X</math> and <math>Y</math>:
| |
− | | |
| 对于任何<math>X</math>和<math>Y</math>: | | 对于任何<math>X</math>和<math>Y</math>: |
| | | |
第183行: |
第135行: |
| \end{align}</math> | | \end{align}</math> |
| | | |
− | where <math>\operatorname{I}(X;Y)</math> is the [[mutual information]] between <math>X</math> and <math>Y</math>.
| |
| | | |
− | 其中<math>\operatorname{I}(X;Y)</math>是<math>X</math>和<math>Y</math>之间的<font color="#ff8000">互信息 mutual information</font>。 | + | 其中<math>\operatorname{I}(X;Y)</math>是<math>X</math>和<math>Y</math>之间的[[互信息 mutual information]]。 |
| | | |
| | | |
− | | + | 对于独立的<math>X</math>和<math>Y</math>: |
− | For independent <math>X</math> and <math>Y</math>:
| |
− | | |
− | 对于独立的X和Y:
| |
| | | |
| | | |
| :<math>H(Y|X) = H(Y) </math> and <math>H(X|Y) = H(X) \, </math> | | :<math>H(Y|X) = H(Y) </math> and <math>H(X|Y) = H(X) \, </math> |
| | | |
− |
| |
− | Although the specific-conditional entropy <math>H(X|Y=y)</math> can be either less or greater than <math>H(X)</math> for a given [[random variate]] <math>y</math> of <math>Y</math>, <math>H(X|Y)</math> can never exceed <math>H(X)</math>.
| |
| | | |
| 对于给定随机变量<math>Y</math>的值<math>y</math>,尽管特定条件熵<math>H(X|Y=y)</math>可以小于或大于<math>H(X)</math>,但<math>H(X|Y)</math>永远不会超过<math>H(X)</math>。 | | 对于给定随机变量<math>Y</math>的值<math>y</math>,尽管特定条件熵<math>H(X|Y=y)</math>可以小于或大于<math>H(X)</math>,但<math>H(X|Y)</math>永远不会超过<math>H(X)</math>。 |
第203行: |
第149行: |
| | | |
| | | |
− | == Conditional differential entropy 条件微分熵 == | + | == 条件微分熵 Conditional differential entropy == |
− | === Definition 定义 === | + | === 定义 === |
− | The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called ''conditional differential (or continuous) entropy''. Let <math>X</math> and <math>Y</math> be a continuous random variables with a [[joint probability density function]] <math>f(x,y)</math>. The differential conditional entropy <math>h(X|Y)</math> is defined as<ref name=cover1991 />{{rp|249}}
| |
− | | |
| 上面的定义是针对离散随机变量的。离散条件熵的连续形式称为'''<font color="#ff8000">条件微分(或连续)熵 Conditional differential (or continuous) entropy </font>'''。 令<math>X</math>和<math>Y</math>为具有联合概率密度函数<math>f(x,y)</math>的连续随机变量。则微分条件熵<math>h(X|Y)</math>定义为:<ref name=cover1991 />{{rp|249}} | | 上面的定义是针对离散随机变量的。离散条件熵的连续形式称为'''<font color="#ff8000">条件微分(或连续)熵 Conditional differential (or continuous) entropy </font>'''。 令<math>X</math>和<math>Y</math>为具有联合概率密度函数<math>f(x,y)</math>的连续随机变量。则微分条件熵<math>h(X|Y)</math>定义为:<ref name=cover1991 />{{rp|249}} |
| | | |
第220行: |
第164行: |
| | | |
| | | |
− | === Properties 属性 === | + | ===属性 Properties === |
− | In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative.
| |
− | | |
| 与离散随机变量的条件熵相比,条件微分熵可能为负。 | | 与离散随机变量的条件熵相比,条件微分熵可能为负。 |
| | | |
− |
| |
− |
| |
− | As in the discrete case there is a chain rule for differential entropy:
| |
| | | |
| 与离散情况一样,微分熵也有链式法则: | | 与离散情况一样,微分熵也有链式法则: |
第234行: |
第173行: |
| :<math>h(Y|X)\,=\,h(X,Y)-h(X)</math><ref name=cover1991 />{{rp|253}} | | :<math>h(Y|X)\,=\,h(X,Y)-h(X)</math><ref name=cover1991 />{{rp|253}} |
| | | |
− |
| |
− | Notice however that this rule may not be true if the involved differential entropies do not exist or are infinite.
| |
| | | |
| 但是请注意,如果所涉及的微分熵不存在或无限,则此法则可能不成立。 | | 但是请注意,如果所涉及的微分熵不存在或无限,则此法则可能不成立。 |
第241行: |
第178行: |
| | | |
| | | |
− | Joint differential entropy is also used in the definition of the [[mutual information]] between continuous random variables: | + | 联合微分熵 Joint differential entropy也用于定义连续随机变量之间的互信息: |
− | | |
− | 联合微分熵也用于定义连续随机变量之间的互信息:
| |
| | | |
| | | |
第249行: |
第184行: |
| | | |
| | | |
− | <math>h(X|Y) \le h(X)</math> with equality if and only if <math>X</math> and <math>Y</math> are independent.<ref name=cover1991 />{{rp|253}} | + | 当且仅当<math>X</math>和<math>Y</math>是独立的,<math>h(X|Y) \le h(X)</math>等号成立。<ref name=cover1991 />{{rp|253}} |
− | | |
− | 当且仅当X和Y是独立的,<math>h(X|Y) \le h(X)</math>等号成立。
| |
− | | |
| | | |
| | | |
− | === Relation to estimator error 与估计量误差的关系 ===
| |
− | The conditional differential entropy yields a lower bound on the expected squared error of an [[estimator]]. For any random variable <math>X</math>, observation <math>Y</math> and estimator <math>\widehat{X}</math> the following holds:<ref name=cover1991 />{{rp|255}}
| |
| | | |
| + | === 与估计量误差的关系Relation to estimator error === |
| 条件微分熵在估计量的期望平方误差上有一个下限。对于任何随机变量<math>X</math>,观察值<math>Y</math>和估计量<math>\widehat{X}</math>,以下条件成立: | | 条件微分熵在估计量的期望平方误差上有一个下限。对于任何随机变量<math>X</math>,观察值<math>Y</math>和估计量<math>\widehat{X}</math>,以下条件成立: |
| | | |
第265行: |
第196行: |
| | | |
| | | |
− | This is related to the [[uncertainty principle]] from [[quantum mechanics]].
| + | 这与来自[[量子力学 quantum mechanics]]的[[不确定性原理 uncertainty principle]]有关。 |
− | | |
− | 这与来自量子力学的不确定性原理有关。
| |
− | | |
− | | |
− | | |
− | == Generalization to quantum theory 量子理论泛化 ==
| |
− | In [[quantum information theory]], the conditional entropy is generalized to the [[conditional quantum entropy]]. The latter can take negative values, unlike its classical counterpart.
| |
− | | |
− | 在量子信息论中,条件熵被广义化为条件量子熵。后者可以采用负值,这与经典方法不同。
| |
| | | |
| | | |
| | | |
− | == See also 其他参考资料 == | + | == 量子理论泛化 Generalization to quantum theory == |
− | * [[Entropy (information theory)]]
| + | 在[[量子信息论 quantum information theory]]中,条件熵被广义化为条件量子熵。后者可以采用负值,这与经典方法不同。 |
− | * [[Mutual information]]
| |
− | * [[Conditional quantum entropy]]
| |
− | * [[Variation of information]]
| |
− | * [[Entropy power inequality]]
| |
− | * [[Likelihood function]]
| |
| | | |
− | * '''<font color="#ff8000">熵(信息论)Entropy (information theory)</font>'''
| |
− | * '''<font color="#ff8000">互信息 Mutual information</font>'''
| |
− | * '''<font color="#ff8000">条件量子熵 Conditional quantum entropy</font>'''
| |
− | * '''<font color="#ff8000">信息差异 Variation of information</font>'''
| |
− | * '''<font color="#ff8000">熵幂不等式 Entropy power inequality</font>'''
| |
− | * '''<font color="#ff8000">似然函数 Likelihood function</font>'''
| |
| | | |
| + | == 另见 == |
| + | * [[熵(信息论) Entropy (information theory)]] |
| + | * [[互信息 Mutual information]] |
| + | * [[条件量子熵 Conditional quantum entropy]] |
| + | * [[信息差异 Variation of information]] |
| + | * [[熵幂不等式 Entropy power inequality]] |
| + | * [[似然函数 Likelihood function]] |
| | | |
| | | |
第298行: |
第216行: |
| {{Reflist}} | | {{Reflist}} |
| | | |
− | [[Category:Entropy and information]] | + | [[Category:熵和信息]] |
− | [[Category:Information theory]] | + | [[Category:信息理论]] |