更改

跳到导航 跳到搜索
删除8,495字节 、 2021年1月17日 (日) 21:49
无编辑摘要
第1行: 第1行:  
此词条由Jie翻译。由Lincent审校。
 
此词条由Jie翻译。由Lincent审校。
  −
{{Short description|Measure of relative information in probability theory}}
  −
{{Information theory}}
      
[[文件:Entropy-mutual-information-relative-entropy-relation-diagram.svg|缩略图|右|该图表示在变量X、Y相关联的各种信息量之间,进行加减关系的维恩图。两个圆所包含的区域是联合熵H(X,Y)。左侧的圆圈(红色和紫色)是单个熵H(X),红色是条件熵H(X ǀ Y)。右侧的圆圈(蓝色和紫色)为H(Y),蓝色为H(Y ǀ X)。中间紫色的是相互信息i(X; Y)。]]
 
[[文件:Entropy-mutual-information-relative-entropy-relation-diagram.svg|缩略图|右|该图表示在变量X、Y相关联的各种信息量之间,进行加减关系的维恩图。两个圆所包含的区域是联合熵H(X,Y)。左侧的圆圈(红色和紫色)是单个熵H(X),红色是条件熵H(X ǀ Y)。右侧的圆圈(蓝色和紫色)为H(Y),蓝色为H(Y ǀ X)。中间紫色的是相互信息i(X; Y)。]]
   −
In [[information theory]], the '''conditional entropy''' quantifies the amount of information needed to describe the outcome of a [[random variable]] <math>Y</math> given that the value of another random variable <math>X</math> is known. Here, information is measured in [[Shannon (unit)|shannon]]s, [[Nat (unit)|nat]]s, or [[Hartley (unit)|hartley]]s. The ''entropy of <math>Y</math> conditioned on <math>X</math>'' is written as H(X ǀ Y).
     −
'''<font color="#ff8000">信息论 Information theory</font>'''中,假设随机变量<math>X</math>的值已知,那么'''<font color="#ff8000">条件熵 Conditional entropy</font>'''则用于定量描述随机变量<math>Y</math>表示的信息量。此时,信息以'''<font color="#ff8000">香农 Shannon </font>''','''<font color="#ff8000">奈特 nat</font>'''或'''<font color="#ff8000">哈特莱 hartley</font>'''来衡量。已知<math>X</math>的条件下<math>Y</math>的熵记为<math>H(X ǀ Y)</math>。
+
[[信息论 Information theory]]中,假设随机变量<math>X</math>的值已知,那么'''条件熵 Conditional entropy'''则用于定量描述随机变量<math>Y</math>表示的信息量。此时,信息以香农 Shannon''','''<font color="#ff8000">奈特 nat</font>'''或'''<font color="#ff8000">哈特莱 hartley</font>'''来衡量。已知<math>X</math>的条件下<math>Y</math>的熵记为<math>H(X ǀ Y)</math>。
         −
== Definition 定义 ==
+
== 定义 ==
The conditional entropy of <math>Y</math> given <math>X</math> is defined as
      
在给定<math>X</math>的情况下,<math>Y</math>的条件熵定义为:
 
在给定<math>X</math>的情况下,<math>Y</math>的条件熵定义为:
        第28行: 第22行:  
|background colour=#F5FFFA}}
 
|background colour=#F5FFFA}}
   −
  −
  −
where <math>\mathcal X</math> and <math>\mathcal Y</math> denote the [[Support (mathematics)|support sets]] of <math>X</math> and <math>Y</math>.
      
其中<math>\mathcal X</math>和<math>\mathcal Y</math>表示<math>X</math>和<math>Y</math>的'''<font color="#ff8000">支撑集 support sets</font>'''。
 
其中<math>\mathcal X</math>和<math>\mathcal Y</math>表示<math>X</math>和<math>Y</math>的'''<font color="#ff8000">支撑集 support sets</font>'''。
       +
'''注意''':约定<math>c > 0</math>始终成立时,表达式<math>0 \log 0</math>和<math>0 \log c/0</math>视为等于零。这是因为<math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math>,而且<math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math>><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref> <!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? -->
   −
''Note:'' It is conventioned that the expressions <math>0 \log 0</math> and <math>0 \log c/0</math> for fixed <math>c > 0</math> should be treated as being equal to zero. This is because <math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math> and <math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref> <!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? -->
  −
  −
注意:约定<math>c > 0</math>始终成立时,表达式<math>0 \log 0</math>和<math>0 \log c/0</math>视为等于零。这是因为<math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math>,而且<math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math>><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref> <!-- because p(x,y) could still equal 0 even if p(x) != 0 and p(y) != 0. What about p(x,y)=p(x)=0? -->
  −
  −
  −
  −
Intuitive explanation of the definition : According to the definition, <math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math> where <math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) .</math> <math>\displaystyle f</math> associates to  <math>\displaystyle ( x,y)</math> the information content of <math>\displaystyle ( Y=y)</math> given <math>\displaystyle (X=x)</math>, which is the amount of information needed to describe the event <math>\displaystyle (Y=y)</math> given <math>(X=x)</math>.  According to the law of large numbers, <math>\displaystyle H(Y|X)</math> is the arithmetic mean of a large number of independent realizations of <math>\displaystyle f(X,Y)</math>.
      
对该定义的直观解释是:根据定义<math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math>,其中<math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) </math>. <math>\displaystyle f</math>将给定<math>\displaystyle (X=x)</math>时的<math>\displaystyle ( Y=y)</math>的信息内容与<math>\displaystyle ( x,y)</math>相关联,这是描述在给定<math>(X=x)</math>条件下的事件<math>\displaystyle (Y=y)</math>所需的信息量。根据大数定律,<math>H(Y ǀ X)</math>是大量<math>\displaystyle f(X,Y)</math>独立实验结果的算术平均值。
 
对该定义的直观解释是:根据定义<math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math>,其中<math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) </math>. <math>\displaystyle f</math>将给定<math>\displaystyle (X=x)</math>时的<math>\displaystyle ( Y=y)</math>的信息内容与<math>\displaystyle ( x,y)</math>相关联,这是描述在给定<math>(X=x)</math>条件下的事件<math>\displaystyle (Y=y)</math>所需的信息量。根据大数定律,<math>H(Y ǀ X)</math>是大量<math>\displaystyle f(X,Y)</math>独立实验结果的算术平均值。
第48行: 第33行:       −
== Motivation 动机 ==
+
== 动机 ==
Let <math>H(Y ǀ X = x)</math> be the [[Shannon Entropy|entropy]] of the discrete random variable <math>Y</math> conditioned on the discrete random variable <math>X</math> taking a certain value <math>x</math>. Denote the support sets of <math>X</math> and <math>Y</math> by <math>\mathcal X</math> and <math>\mathcal Y</math>. Let <math>Y</math> have [[probability mass function]] <math>p_Y{(y)}</math>. The unconditional entropy of <math>Y</math> is calculated as <math>H(Y):=E[I(Y)</math>, i.e.
      
设<math>H(Y ǀ X = x)</math>为离散随机变量<math>Y</math>在离散随机变量<math>X</math>取定值<math>x</math>时的熵。用<math>\mathcal X</math>和<math>\mathcal Y</math>表示<math>X</math>和<math>Y</math>的支撑集。令<math>Y</math>的概率密度函数为<math>p_Y{(y)}</math>。<math>Y</math>的无条件熵计算为<math>H(Y):=E[I(Y)</math>。
 
设<math>H(Y ǀ X = x)</math>为离散随机变量<math>Y</math>在离散随机变量<math>X</math>取定值<math>x</math>时的熵。用<math>\mathcal X</math>和<math>\mathcal Y</math>表示<math>X</math>和<math>Y</math>的支撑集。令<math>Y</math>的概率密度函数为<math>p_Y{(y)}</math>。<math>Y</math>的无条件熵计算为<math>H(Y):=E[I(Y)</math>。
第57行: 第41行:  
= -\sum_{y\in\mathcal Y} {p_Y(y) \log_2{p_Y(y)}},</math>
 
= -\sum_{y\in\mathcal Y} {p_Y(y) \log_2{p_Y(y)}},</math>
   −
  −
where <math>\operatorname{I}(y_i)</math> is the [[information content]] of the [[Outcome (probability)|outcome]] of <math>Y</math> taking the value <math>y_i</math>. The entropy of <math>Y</math> conditioned on <math>X</math> taking the value <math>x</math> is defined analogously by [[conditional expectation]]:
      
当<math>Y</math>取值为<math>y_i</math>时,<math>\operatorname{I}(y_i)</math>是其结果<math>Y</math>的信息内容。类似地,当<math>X</math>值为<math>x</math>时以<math>X</math>为条件的<math>Y</math>的熵,也可以通过条件期望来定义:
 
当<math>Y</math>取值为<math>y_i</math>时,<math>\operatorname{I}(y_i)</math>是其结果<math>Y</math>的信息内容。类似地,当<math>X</math>值为<math>x</math>时以<math>X</math>为条件的<math>Y</math>的熵,也可以通过条件期望来定义:
第67行: 第49行:       −
Note that<math> H(Y ǀ X)</math> is the result of averaging <math>H(Y ǀ X = x)</math> over all possible values <math>x</math> that <math>X</math> may take. Also, if the above sum is taken over a sample <math>y_1, \dots, y_n</math>, the expected value <math>E_X[ H(y_1, \dots, y_n \mid X = x)]</math> is known in some domains as '''equivocation'''.<ref>{{cite journal|author1=Hellman, M.|author2=Raviv, J.|year=1970|title=Probability of error, equivocation, and the Chernoff bound|journal=IEEE Transactions on Information Theory|volume=16|issue=4|pp=368-372}}</ref>
+
'''注意''',<math> H(Y ǀ X)</math>是在<math>X</math>可能取的所有可能值<math>x</math>时对<math>H(Y ǀ X = x)</math>求平均值的结果。同样,如果上述和取自样本<math>y_1, \dots, y_n</math>上,则期望值<math>E_X[ H(y_1, \dots, y_n \mid X = x)]</math><font color="#32cd32"> 在某些领域中认为是模糊值</font>。<ref>{{cite journal|author1=Hellman, M.|author2=Raviv, J.|year=1970|title=Probability of error, equivocation, and the Chernoff bound|journal=IEEE Transactions on Information Theory|volume=16|issue=4|pp=368-372}}</ref>
   −
注意,<math> H(Y ǀ X)</math>是在<math>X</math>可能取的所有可能值<math>x</math>时对<math>H(Y ǀ X = x)</math>求平均值的结果。同样,如果上述和取自样本<math>y_1, \dots, y_n</math>上,则期望值<math>E_X[ H(y_1, \dots, y_n \mid X = x)]</math><font color="#32cd32"> 在某些领域中认为是模糊值</font>。<ref>{{cite journal|author1=Hellman, M.|author2=Raviv, J.|year=1970|title=Probability of error, equivocation, and the Chernoff bound|journal=IEEE Transactions on Information Theory|volume=16|issue=4|pp=368-372}}</ref>
  −
  −
  −
  −
Given [[Discrete random variable|discrete random variables]] <math>X</math> with image <math>\mathcal X</math> and <math>Y</math> with image <math>\mathcal Y</math>, the conditional entropy of <math>Y</math> given <math>X</math> is defined as the weighted sum of <math>H(Y|X=x)</math> for each possible value of <math>x</math>, using  <math>p(x)</math> as the weights:<ref name=cover1991>{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|15}}
      
给定具有像<math>\mathcal X</math>的离散随机变量<math>X</math>和具有像<math>\mathcal Y</math>的离散随机变量<math>Y</math>,将给定<math>X</math>的<math>Y</math>的条件熵定义为以<math>p(x)</math>作为权重,对<math>x</math>的每个可能取值得到的<math>H(Y|X=x)</math>的加权和。其表达式如下:<ref name=cover1991>{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|15}}
 
给定具有像<math>\mathcal X</math>的离散随机变量<math>X</math>和具有像<math>\mathcal Y</math>的离散随机变量<math>Y</math>,将给定<math>X</math>的<math>Y</math>的条件熵定义为以<math>p(x)</math>作为权重,对<math>x</math>的每个可能取值得到的<math>H(Y|X=x)</math>的加权和。其表达式如下:<ref name=cover1991>{{cite book|isbn=0-471-06259-6|year=1991|authorlink1=Thomas M. Cover|author1=T. Cover|author2=J. Thomas|title=Elements of Information Theory|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|15}}
第89行: 第66行:  
</math>
 
</math>
   −
<!-- This paragraph is incorrect; the last line is not the KL divergence between any two distributions, since p(x) is [in general] not a valid distribution over the domains of X and Y. The last formula above is the [[Kullback-Leibler divergence]], also known as relative entropy. Relative entropy is always positive, and vanishes if and only if <math>p(x,y) = p(x)</math>. This is when knowing <math>x</math> tells us everything about <math>y</math>.  ADDED: Could this comment be out of date since the KL divergence is not mentioned above? November 2014 -->
  −
  −
  −
  −
== Properties 属性 ==
  −
=== Conditional entropy equals zero 条件熵等于零 ===
  −
<math>H(Y|X)=0</math> if and only if the value of <math>Y</math> is completely determined by the value of <math>X</math>.
      +
== 属性 ==
 +
=== 条件熵等于零 Conditional entropy equals zero ===
 
当且仅当<math>Y</math>的值完全由<math>X</math>的值确定时,<math>H(Y|X)=0</math>。
 
当且仅当<math>Y</math>的值完全由<math>X</math>的值确定时,<math>H(Y|X)=0</math>。
      −
 
+
===独立随机变量的条件熵 Conditional entropy of independent random variables ===
=== Conditional entropy of independent random variables 独立随机变量的条件熵 ===
  −
Conversely, <math>H(Y|X) = H(Y)</math> if and only if <math>Y</math> and <math>X</math> are [[independent random variables]].
      
相反,当且仅当<math>Y</math>和<math>X</math>是互相独立的随机变量时,则<math>H(Y|X) =H(Y)</math>。
 
相反,当且仅当<math>Y</math>和<math>X</math>是互相独立的随机变量时,则<math>H(Y|X) =H(Y)</math>。
第108行: 第78行:       −
=== Chain rule 链式法则 ===
+
=== 链式法则 Chain rule ===
Assume that the combined system determined by two random variables <math>X</math> and <math>Y</math> has [[joint entropy]] <math>H(X,Y)</math>, that is, we need <math>H(X,Y)</math> bits of information on average to describe its exact state. Now if we first learn the value of <math>X</math>, we have gained <math>H(X)</math> bits of information. Once <math>X</math> is known, we only need <math>H(X,Y)-H(X)</math> bits to describe the state of the whole system. This quantity is exactly <math>H(Y|X)</math>, which gives the ''chain rule'' of conditional entropy:
+
假设由两个随机变量<math>X</math><math>Y</math>确定的组合系统具有[[联合熵]]<math>H(X,Y)</math>,也就是说,我们通常需要<math>H(X,Y)</math>位信息来描述其确切状态。现在,如果我们首先尝试获得<math>X</math>的值,我们将知晓<math>H(X)</math>位信息。一旦<math>X</math>的值确定,我们就可以通过<math>H(X,Y)</math>-<math>H(X)</math>位来描述整个系统的状态。这个数量恰好是<math>H(Y|X)</math>,它给出了条件熵的链式法则:
 
  −
假设由两个随机变量<math>X</math>和<math>Y</math>确定的组合系统具有联合熵<math>H(X,Y)</math>,也就是说,我们通常需要<math>H(X,Y)</math>位信息来描述其确切状态。现在,如果我们首先尝试获得<math>X</math>的值,我们将知晓<math>H(X)</math>位信息。一旦<math>X</math>的值确定,我们就可以通过<math>H(X,Y)</math>-<math>H(X)</math>位来描述整个系统的状态。这个数量恰好是<math>H(Y|X)</math>,它给出了条件熵的链式法则:
         
:<math>H(Y|X)\, = \, H(X,Y)- H(X).</math><ref name=cover1991 />{{rp|17}}
 
:<math>H(Y|X)\, = \, H(X,Y)- H(X).</math><ref name=cover1991 />{{rp|17}}
   −
  −
The chain rule follows from the above definition of conditional entropy:
      
链式法则遵循以上条件熵的定义:
 
链式法则遵循以上条件熵的定义:
第130行: 第96行:  
\end{align}</math>
 
\end{align}</math>
   −
  −
In general, a chain rule for multiple random variables holds:
      
通常情况下,多个随机变量的链式法则表示为:
 
通常情况下,多个随机变量的链式法则表示为:
第139行: 第103行:  
  \sum_{i=1}^n H(X_i | X_1, \ldots, X_{i-1}) </math><ref name=cover1991 />{{rp|22}}
 
  \sum_{i=1}^n H(X_i | X_1, \ldots, X_{i-1}) </math><ref name=cover1991 />{{rp|22}}
   −
  −
It has a similar form to [[Chain rule (probability)|chain rule]] in probability theory, except that addition instead of multiplication is used.
      
除了使用加法而不是乘法之外,它具有与概率论中的链式法则类似的形式。
 
除了使用加法而不是乘法之外,它具有与概率论中的链式法则类似的形式。
第146行: 第108行:       −
=== Bayes' rule 贝叶斯法则 ===
+
===贝叶斯法则 Bayes' rule ===
[[Bayes' rule]] for conditional entropy states
+
条件熵状态的[[贝叶斯法则 Bayes' rule]]
 
  −
条件熵状态的贝叶斯法则
        第155行: 第115行:       −
''Proof.'' <math>H(Y|X) = H(X,Y) - H(X)</math> and <math>H(X|Y) = H(Y,X) - H(Y)</math>. Symmetry entails <math>H(X,Y) = H(Y,X)</math>. Subtracting the two equations implies Bayes' rule.
+
'''证明'''<math>H(Y|X) = H(X,Y) - H(X)</math> 和 <math>H(X|Y) = H(Y,X) - H(Y)</math>。对称性要求<math>H(X,Y) = H(Y,X)</math>。将两个方程式相减就意味着贝叶斯定律。
 
  −
证明,<math>H(Y|X) = H(X,Y) - H(X)</math> 和 <math>H(X|Y) = H(Y,X) - H(Y)</math>。对称性要求<math>H(X,Y) = H(Y,X)</math>。将两个方程式相减就意味着贝叶斯定律。
  −
 
  −
 
     −
If <math>Y</math> is [[Conditional independence|conditionally independent]] of <math>Z</math> given <math>X</math> we have:
      
如果给定<math>X</math>,<math>Y</math>有条件地独立于<math>Z</math>,则有:
 
如果给定<math>X</math>,<math>Y</math>有条件地独立于<math>Z</math>,则有:
第169行: 第124行:       −
 
+
===其他性质 ===
=== Other properties 其他性质 ===
  −
For any <math>X</math> and <math>Y</math>:
  −
 
   
对于任何<math>X</math>和<math>Y</math>:
 
对于任何<math>X</math>和<math>Y</math>:
   第183行: 第135行:  
\end{align}</math>
 
\end{align}</math>
   −
where <math>\operatorname{I}(X;Y)</math> is the [[mutual information]] between <math>X</math> and <math>Y</math>.
     −
其中<math>\operatorname{I}(X;Y)</math>是<math>X</math>和<math>Y</math>之间的<font color="#ff8000">互信息 mutual information</font>
+
其中<math>\operatorname{I}(X;Y)</math>是<math>X</math>和<math>Y</math>之间的[[互信息 mutual information]]
      −
 
+
对于独立的<math>X</math><math>Y</math>
For independent <math>X</math> and <math>Y</math>:
  −
 
  −
对于独立的X和Y:
         
:<math>H(Y|X) = H(Y) </math> and <math>H(X|Y) = H(X) \, </math>
 
:<math>H(Y|X) = H(Y) </math> and <math>H(X|Y) = H(X) \, </math>
   −
  −
Although the specific-conditional entropy <math>H(X|Y=y)</math> can be either less or greater than <math>H(X)</math> for a given [[random variate]] <math>y</math> of <math>Y</math>, <math>H(X|Y)</math> can never exceed <math>H(X)</math>.
      
对于给定随机变量<math>Y</math>的值<math>y</math>,尽管特定条件熵<math>H(X|Y=y)</math>可以小于或大于<math>H(X)</math>,但<math>H(X|Y)</math>永远不会超过<math>H(X)</math>。
 
对于给定随机变量<math>Y</math>的值<math>y</math>,尽管特定条件熵<math>H(X|Y=y)</math>可以小于或大于<math>H(X)</math>,但<math>H(X|Y)</math>永远不会超过<math>H(X)</math>。
第203行: 第149行:       −
== Conditional differential entropy 条件微分熵 ==
+
== 条件微分熵 Conditional differential entropy ==
=== Definition 定义 ===
+
=== 定义 ===
The above definition is for discrete random variables. The continuous version of discrete conditional entropy is called ''conditional differential (or continuous) entropy''. Let <math>X</math> and <math>Y</math> be a continuous random variables with a [[joint probability density function]] <math>f(x,y)</math>. The differential conditional entropy <math>h(X|Y)</math> is defined as<ref name=cover1991 />{{rp|249}}
  −
 
   
上面的定义是针对离散随机变量的。离散条件熵的连续形式称为'''<font color="#ff8000">条件微分(或连续)熵 Conditional differential (or continuous) entropy </font>'''。 令<math>X</math>和<math>Y</math>为具有联合概率密度函数<math>f(x,y)</math>的连续随机变量。则微分条件熵<math>h(X|Y)</math>定义为:<ref name=cover1991 />{{rp|249}}
 
上面的定义是针对离散随机变量的。离散条件熵的连续形式称为'''<font color="#ff8000">条件微分(或连续)熵 Conditional differential (or continuous) entropy </font>'''。 令<math>X</math>和<math>Y</math>为具有联合概率密度函数<math>f(x,y)</math>的连续随机变量。则微分条件熵<math>h(X|Y)</math>定义为:<ref name=cover1991 />{{rp|249}}
   第220行: 第164行:       −
=== Properties 属性 ===
+
===属性 Properties ===
In contrast to the conditional entropy for discrete random variables, the conditional differential entropy may be negative.
  −
 
   
与离散随机变量的条件熵相比,条件微分熵可能为负。
 
与离散随机变量的条件熵相比,条件微分熵可能为负。
   −
  −
  −
As in the discrete case there is a chain rule for differential entropy:
      
与离散情况一样,微分熵也有链式法则:
 
与离散情况一样,微分熵也有链式法则:
第234行: 第173行:  
:<math>h(Y|X)\,=\,h(X,Y)-h(X)</math><ref name=cover1991 />{{rp|253}}
 
:<math>h(Y|X)\,=\,h(X,Y)-h(X)</math><ref name=cover1991 />{{rp|253}}
   −
  −
Notice however that this rule may not be true if the involved differential entropies do not exist or are infinite.
      
但是请注意,如果所涉及的微分熵不存在或无限,则此法则可能不成立。
 
但是请注意,如果所涉及的微分熵不存在或无限,则此法则可能不成立。
第241行: 第178行:       −
Joint differential entropy is also used in the definition of the [[mutual information]] between continuous random variables:
+
联合微分熵 Joint differential entropy也用于定义连续随机变量之间的互信息:
 
  −
联合微分熵也用于定义连续随机变量之间的互信息:
        第249行: 第184行:       −
<math>h(X|Y) \le h(X)</math> with equality if and only if <math>X</math> and <math>Y</math> are independent.<ref name=cover1991 />{{rp|253}}
+
当且仅当<math>X</math><math>Y</math>是独立的,<math>h(X|Y) \le h(X)</math>等号成立。<ref name=cover1991 />{{rp|253}}
 
  −
当且仅当X和Y是独立的,<math>h(X|Y) \le h(X)</math>等号成立。
  −
 
        −
=== Relation to estimator error 与估计量误差的关系 ===
  −
The conditional differential entropy yields a lower bound on the expected squared error of an [[estimator]]. For any random variable <math>X</math>, observation <math>Y</math> and estimator <math>\widehat{X}</math> the following holds:<ref name=cover1991 />{{rp|255}}
      +
=== 与估计量误差的关系Relation to estimator error  ===
 
条件微分熵在估计量的期望平方误差上有一个下限。对于任何随机变量<math>X</math>,观察值<math>Y</math>和估计量<math>\widehat{X}</math>,以下条件成立:
 
条件微分熵在估计量的期望平方误差上有一个下限。对于任何随机变量<math>X</math>,观察值<math>Y</math>和估计量<math>\widehat{X}</math>,以下条件成立:
   第265行: 第196行:       −
This is related to the [[uncertainty principle]] from [[quantum mechanics]].
+
这与来自[[量子力学 quantum mechanics]][[不确定性原理 uncertainty principle]]有关。
 
  −
这与来自量子力学的不确定性原理有关。
  −
 
  −
 
  −
 
  −
== Generalization to quantum theory 量子理论泛化 ==
  −
In [[quantum information theory]], the conditional entropy is generalized to the [[conditional quantum entropy]]. The latter can take negative values, unlike its classical counterpart.
  −
 
  −
在量子信息论中,条件熵被广义化为条件量子熵。后者可以采用负值,这与经典方法不同。
           −
== See also 其他参考资料 ==
+
== 量子理论泛化 Generalization to quantum theory  ==
* [[Entropy (information theory)]]
+
[[量子信息论 quantum information theory]]中,条件熵被广义化为条件量子熵。后者可以采用负值,这与经典方法不同。
* [[Mutual information]]
  −
* [[Conditional quantum entropy]]
  −
* [[Variation of information]]
  −
* [[Entropy power inequality]]
  −
* [[Likelihood function]]
     −
* '''<font color="#ff8000">熵(信息论)Entropy (information theory)</font>'''
  −
* '''<font color="#ff8000">互信息 Mutual information</font>'''
  −
* '''<font color="#ff8000">条件量子熵 Conditional quantum entropy</font>'''
  −
* '''<font color="#ff8000">信息差异 Variation of information</font>'''
  −
* '''<font color="#ff8000">熵幂不等式 Entropy power inequality</font>'''
  −
* '''<font color="#ff8000">似然函数 Likelihood function</font>'''
      +
== 另见 ==
 +
* [[熵(信息论) Entropy (information theory)]]
 +
* [[互信息 Mutual information]]
 +
* [[条件量子熵 Conditional quantum entropy]]
 +
* [[信息差异 Variation of information]]
 +
* [[熵幂不等式 Entropy power inequality]]
 +
* [[似然函数 Likelihood function]]
      第298行: 第216行:  
{{Reflist}}
 
{{Reflist}}
   −
[[Category:Entropy and information]]
+
[[Category:熵和信息]]
[[Category:Information theory]]
+
[[Category:信息理论]]
7,129

个编辑

导航菜单