更改

条件熵 (查看源代码)

2021年1月17日 (日) 23:59的版本

删除103字节、 2021年1月17日 (日) 23:59

第27行：第27行： −

'''注意'''：约定<math>c > 0</math>始终成立时，表达式<math>0 \log 0</math>和<math>0 \log c/0</math>视为等于零。这是因为<math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math>，而且<math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math>><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref> ~~~~

+

'''注意'''：约定<math>c > 0</math>始终成立时，表达式<math>0 \log 0</math>和<math>0 \log c/0</math>视为等于零。这是因为<math>\lim_{\theta\to0^+} \theta\, \log \,c/\theta = 0</math>，而且<math>\lim_{\theta\to0^+} \theta\, \log \theta = 0</math><ref>{{Cite web|url=http://www.inference.org.uk/mackay/itprnn/book.html|title=David MacKay: Information Theory, Pattern Recognition and Neural Networks: The Book|website=www.inference.org.uk|access-date=2019-10-25}}</ref>

−

对该定义的直观解释是：根据定义<math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math>，其中<math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) </math>. <math>\displaystyle f</math>将给定<math>\displaystyle (X=x)</math>时的<math>\displaystyle ( Y=y)</math>的信息内容与<math>\displaystyle ( x,y)</math>相关联，这是描述在给定<math>(X=x)</math>条件下的事件<math>\displaystyle (Y=y)</math>所需的信息量。根据大数定律，<math>H（Y ǀ X）</math>是大量<math>\displaystyle f(X,Y)</math>独立实验结果的算术平均值。

+

对该定义的直观解释是：根据定义<math>\displaystyle H( Y|X) =\mathbb{E}( \ f( X,Y) \ )</math>，其中<math>\displaystyle f:( x,y) \ \rightarrow -\log( \ p( y|x) \ ) </math>。 <math>\displaystyle f</math>将给定<math>\displaystyle (X=x)</math>时的<math>\displaystyle ( Y=y)</math>的信息内容与<math>\displaystyle ( x,y)</math>相关联，这是描述在给定<math>(X=x)</math>条件下的事件<math>\displaystyle (Y=y)</math>所需的信息量。根据大数定律，<math>H（Y ǀ X）</math>是大量<math>\displaystyle f(X,Y)</math>独立实验结果的算术平均值。

== 动机 ==

薄荷

7,129

个编辑