更改

微分熵 (查看源代码)

2021年1月29日 (五) 07:04的版本

删除14,114字节、 2021年1月29日 (五) 07:04

无编辑摘要

第1行：第1行：

此词条暂由Henry翻译。

由CecileLi初步审校。

−

~~{{Short description|Concept in information theory}}~~

−

~~{{Information theory}}~~

−

第13行：第8行：

微分熵Differential entropy(也被称为连续熵)是信息论中的一个概念，其来源于香农尝试将他的香农熵的概念扩展到连续的概率分布。香农熵是衡量一个随机变量的平均惊异程度的指标。可惜的是，香农只是假设它是离散熵的正确连续模拟而并没有推导出公式，但事实上它并不是离散熵的正确连续模拟。

−

<math>h(X_1, \ldots, X_n) = \sum_{i=1}^{n} h(X_i|X_1, \ldots, X_{i-1}) \leq \sum_{i=1}^{n} h(X_i)</math>.

−

~~< math > h (x _ 1，ldots，xn) = sum _ { i = 1} ^ { n } h (x _ i | x _ 1，ldots，x _ { i-1}) leq sum _ { i = 1} ^ { n } h (x _ i) </math > .~~

==Definition==

定义

−

Let <math>X</math> be a random variable with a [[probability density function]] <math>f</math> whose [[support (mathematics)|support]] is a set <math>\mathcal X</math>. The ''differential entropy'' <math>h(X)</math> or <math>h(f)</math> is defined as<ref name="cover_thomas">{{cite book|first1=Thomas M.|first2=Joy A.|last1=Cover|last2=Thomas|isbn=0-471-06259-6|title=Elements of Information Theory|year=1991|publisher=Wiley|location=New York|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>~~{{rp|243}}~~

+

Let <math>X</math> be a random variable with a [[probability density function]] <math>f</math> whose [[support (mathematics)|support]] is a set <math>\mathcal X</math>. The ''differential entropy'' <math>h(X)</math> or <math>h(f)</math> is defined as<ref name="cover_thomas">{{cite book|first1=Thomas M.|first2=Joy A.|last1=Cover|last2=Thomas|isbn=0-471-06259-6|title=Elements of Information Theory|year=1991|publisher=Wiley|location=New York|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>

−

−

~~[ math > h (x + c) = h (x) </math >~~

--[[用户:CecileLi|CecileLi]]([[用户讨论:CecileLi|讨论]]) 【审校】此处缺无格式的英文及翻译补充：设随机变量X，其概率密度函数F的的定义域是X的集合

−

~~{{Equation box 1~~

+

:<math>h(X) = -\int_\mathcal{X} f(x)\log f(x)\,dx</math>

−

~~In particular, for a constant <math>a</math>~~

−

~~特别地，对于一个常量~~

−

~~|indent =~~

−

~~<math>h(aX) = h(X)+ \log |a|</math>~~

−

~~H (aX) = h (x) + log | a | </math~~

−

~~|title=~~

−

~~For a vector valued random variable <math>\mathbf{X}</math> and an invertible (square) matrix <math>\mathbf{A}</math>~~

−

~~对于向量值随机变量 < math > mathbf { x } </math > 和可逆矩阵 < math > mathbf { a } </math >~~

−

~~|equation =~~ <math>h(X) = -\int_\mathcal{X} f(x)\log f(x)\,dx</math>

−

~~<math>h(\mathbf{A}\mathbf{X})=h(\mathbf{X})+\log \left( |\det \mathbf{A}| \right)</math>~~

−

~~< math > h (mathbf { a } mathbf { x }) = h (mathbf { x }) + log left (| det mathbf { a } | right) </math >~~

−

~~|cellpadding= 6~~

−

~~|border~~

−

~~<math>h(\mathbf{Y}) \leq h(\mathbf{X}) + \int f(x) \log \left\vert \frac{\partial m}{\partial x} \right\vert dx</math>~~

−

~~[ math > h (mathbf { y }) leq h (mathbf { x }) + int f (x) log left vert frac { partial m }{ partial x } right vert dx </math >~~

−

~~|border colour = #0073CF~~

−

~~where <math>\left\vert \frac{\partial m}{\partial x} \right\vert</math> is the Jacobian of the transformation <math>m</math>.~~

−

~~其中“ math” > “ left vert”{ partial m }{ partial x }“ right vert” >/math > 是变换的雅可比矩阵。~~

−

~~|background colour=#F5FFFA}}~~

−

~~However, differential entropy does not have other desirable properties:~~

−

~~然而，微分熵并没有其他令人满意的特性:~~

For probability distributions which don't have an explicit density function expression, but have an explicit [[quantile function]] expression, <math>Q(p)</math>, then <math>h(Q)</math> can be defined in terms of the derivative of <math>Q(p)</math> i.e. the quantile density function <math>Q'(p)</math> as <ref>{{Citation |last1=Vasicek |first1=Oldrich |year=1976 |title=A Test for Normality Based on Sample Entropy |journal=[[Journal of the Royal Statistical Society, Series B]] |volume=38 |issue=1 |jstor=2984828 |postscript=. }}</ref>{{rp|54–59}}

第79行：第26行：

--[[用户:CecileLi|CecileLi]]([[用户讨论:CecileLi|讨论]]) 【审校】此处缺无格式的英文及翻译补充：For probability distributions which don't have an explicit density function expression, but have an explicit quantile function expression, , then can be defined in terms of the derivative of i.e. the quantile density function as

对于没有显式密度函数表达式，但有显式分位数函数表达式的概率分布，我们则可以用分位数密度函数的导数来定义，即

+

:<math>h(Q) = \int_0^1 \log Q'(p)\,dp</math>.

第88行：第36行：

As with its discrete analog, the units of differential entropy depend on the base of the [[logarithm]], which is usually 2 (i.e., the units are [[bit]]s). See [[logarithmic units]] for logarithms taken in different bases. Related concepts such as [[joint entropy|joint]], [[conditional entropy|conditional]] differential entropy, and [[Kullback–Leibler divergence|relative entropy]] are defined in a similar fashion. Unlike the discrete analog, the differential entropy has an offset that depends on the units used to measure <math>X</math>.<ref name="gibbs">{{cite book |last=Gibbs |first=Josiah Willard |authorlink=Josiah Willard Gibbs |title=[[Elementary Principles in Statistical Mechanics|Elementary Principles in Statistical Mechanics, developed with especial reference to the rational foundation of thermodynamics]] |year=1902 |publisher=Charles Scribner's Sons |location=New York}}</ref>{{rp|183–184}} For example, the differential entropy of a quantity measured in millimeters will be {{not a typo|log(1000)}} more than the same quantity measured in meters; a dimensionless quantity will have differential entropy of {{not a typo|log(1000)}} more than the same quantity divided by 1000.

−

One must take care in trying to apply properties of discrete entropy to differential entropy, since probability density functions can be greater than 1. For example, the [[Uniform distribution (continuous)|uniform distribution]] <math>\mathcal{U}(0,1/2)</math> has ''negative'' differential entropy

−

With a normal distribution, differential entropy is maximized for a given variance. A Gaussian random variable has the largest entropy amongst all random variables of equal variance, or, alternatively, the maximum entropy distribution under constraints of mean and variance is the Gaussian.

−

在一个正态分布下，对于给定的方差，微分熵是最大的。在所有方差相等的随机变量中，高斯型随机变量的熵最大，或者说在均值和方差约束下的最大熵分布是高斯型随机变量。

−

:<math>\int_0^\frac{1}{2} -2\log(2)\,dx=-\log(2)\,</math>.

−

Let <math>g(x)</math> be a Gaussian PDF with mean μ and variance <math>\sigma^2</math> and <math>f(x)</math> an arbitrary PDF with the same variance. Since differential entropy is translation invariant we can assume that <math>f(x)</math> has the same mean of <math>\mu</math> as <math>g(x)</math>.

−

设g(x) 是一个高斯分布的 PDF，平均值μ 和方差σ2和f(x)一个任意的 PDF，方差相同。由于微分熵是平移不变的，我们可以假设 f(x) 与g(x）具有相同的平均值。

Thus, differential entropy does not share all properties of discrete entropy.

+

Note that the continuous [[mutual information]] <math>I(X;Y)</math> has the distinction of retaining its fundamental significance as a measure of discrete information since it is actually the limit of the discrete mutual information of ''partitions'' of <math>X</math> and <math>Y</math> as these partitions become finer and finer. Thus it is invariant under non-linear [[homeomorphisms]] (continuous and uniquely invertible maps), <ref>{{cite journal | first = Alexander | last = Kraskov |author2=Stögbauer, Grassberger | year = 2004 | title = Estimating mutual information | journal = [[Physical Review E]] | volume = 60 | pages = 066138 | doi =10.1103/PhysRevE.69.066138|arxiv = cond-mat/0305641 |bibcode = 2004PhRvE..69f6138K }}</ref> including linear <ref name = Reza>{{ cite book | title = An Introduction to Information Theory | author = Fazlollah M. Reza | publisher = Dover Publications, Inc., New York | origyear = 1961| year = 1994 | isbn = 0-486-68210-2 | url = https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22&as_brr=0&ei=zP79Ro7UBovqoQK4g_nCCw&sig=j3lPgyYrC3-bvn1Td42TZgTzj0Q }}</ref> transformations of <math>X</math> and <math>Y</math>, and still represents the amount of discrete information that can be transmitted over a channel that admits a continuous space of values.

−

~~Consider the Kullback–Leibler divergence between the two distributions~~

−

~~考虑两个分布之间的 Kullback-Leibler 散度~~

−

Note that the continuous [[mutual information]] <math>I(X;Y)</math> has the distinction of retaining its fundamental significance as a measure of discrete information since it is actually the limit of the discrete mutual information of ''partitions'' of <math>X</math> and <math>Y</math> as these partitions become finer and finer. Thus it is invariant under non-linear [[homeomorphisms]] (continuous and uniquely invertible maps), <ref>{{cite journal

−

~~<math> 0 \leq D_{KL}(f || g) = \int_{-\infty}^\infty f(x) \log \left( \frac{f(x)}{g(x)} \right) dx = -h(f) - \int_{-\infty}^\infty f(x)\log(g(x)) dx.</math>~~

−

~~(f | | g) = int _ {-infty } ^ infty f (x) log left (frac { f (x)}{ g (x)} right) dx =-h (f)-int _ {-infty } ^ infty f (x) log (g (x)) dx~~

−

~~| first = Alexander~~

−

~~Now note that~~

−

~~现在注意~~

−

~~| last = Kraskov~~

−

~~<math>\begin{align}~~

−

~~1.1.1.2.2.2.2.2.2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.3.3.3.3.3.3.3.3.3.3.3.4.3.3.3.3.3.3.3.3.3~~

−

~~|author2=Stögbauer, Grassberger~~

−

~~\int_{-\infty}^\infty f(x)\log(g(x)) dx &= \int_{-\infty}^\infty f(x)\log\left( \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right) dx \\~~

−

~~Int _ {-infty } ^ infty f (x) log (g (x)) dx & = int _ {-infty } ^ infty f (x) log left (frac {1}{ sqrt {2 pi sigma ^ 2} e ^ {-frac {(x-mu) ^ 2}{2 sigma ^ 2} right) dx~~

−

~~| year = 2004~~

−

~~&= \int_{-\infty}^\infty f(x) \log\frac{1}{\sqrt{2\pi\sigma^2}} dx + \log(e)\int_{-\infty}^\infty f(x)\left( -\frac{(x-\mu)^2}{2\sigma^2}\right) dx \\~~

−

~~& = int _ {-infty } ^ infty f (x) log frac {1}{ sqrt {2 pi sigma ^ 2} dx + log (e) int _ {-infty } ^ infty f (x) left (- frac {(x-mu) ^ 2}{2 sigma ^ 2} right) dx~~

−

~~| title = Estimating mutual information~~

−

~~&= -\tfrac{1}{2}\log(2\pi\sigma^2) - \log(e)\frac{\sigma^2}{2\sigma^2} \\~~

−

~~& =-tfrac {1}{2} log (2 pi sigma ^ 2)-log (e) frac { sigma ^ 2}{2 sigma ^ 2}~~

−

~~| journal = [[Physical Review E]]~~

−

~~&= -\tfrac{1}{2}\left(\log(2\pi\sigma^2) + \log(e)\right) \\~~

−

~~& =-tfrac {1}{2}左(log (2 pi sigma ^ 2) + log (e) right)~~

−

~~| volume = 60~~

−

~~&= -\tfrac{1}{2}\log(2\pi e \sigma^2) \\~~

−

~~& =-tfrac {1}{2} log (2 pi e sigma ^ 2)~~

−

~~| pages = 066138~~

−

~~&= -h(g)~~

−

~~& =-h (g)~~

−

~~| doi =10.1103/PhysRevE.69.066138~~

−

~~\end{align}</math>~~

−

~~结束{ align } </math >~~

−

|arxiv = cond-mat/0305641 |bibcode = 2004PhRvE..69f6138K }}</ref> including linear <ref name = Reza>{{ cite book | title = An Introduction to Information Theory | author = Fazlollah M. Reza | publisher = Dover Publications, Inc., New York | origyear = 1961| year = 1994 | isbn = 0-486-68210-2 | url = https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22&as_brr=0&ei=zP79Ro7UBovqoQK4g_nCCw&sig=j3lPgyYrC3-bvn1Td42TZgTzj0Q }}</ref> transformations of <math>X</math> and <math>Y</math>, and still represents the amount of discrete information that can be transmitted over a channel that admits a continuous space of values.

−

~~because the result does not depend on <math>f(x)</math> other than through the variance. Combining the two results yields~~

−

~~因为结果并不依赖于f(x)，而是通过方差。将这两个结果结合起来就会产生结果~~

−

~~<math> h(g) - h(f) \geq 0 \!</math>~~

−

~~[数学]-[数学]~~

For the direct analogue of discrete entropy extended to the continuous space, see [[limiting density of discrete points]].

−

~~with equality when <math>f(x)=g(x)</math> following from the properties of Kullback–Leibler divergence.~~

−

~~当f (x) = g (x)遵循 Kullback-Leibler 分歧的性质时。~~

==Properties of differential entropy==

微分熵的性质

* For probability densities <math>f</math> and <math>g</math>, the [[Kullback–Leibler divergence]] <math>D_{KL}(f || g)</math> is greater than or equal to 0 with equality only if <math>f=g</math> [[almost everywhere]]. Similarly, for two random variables <math>X</math> and <math>Y</math>, <math>I(X;Y) \ge 0</math> and <math>h(X|Y) \le h(X)</math> with equality [[if and only if]] <math>X</math> and <math>Y</math> are [[Statistical independence|independent]].

−

~~This result may also be demonstrated using the variational calculus. A Lagrangian function with two Lagrangian multipliers may be defined as:~~

−

~~这个结果也可以用变分法来证明。具有两个拉格朗日乘数的拉格朗日函数可定义为:~~

* The chain rule for differential entropy holds as in the discrete case<ref name="cover_thomas" />{{rp|253}}

第206行：第61行：

<math>L=\int_{-\infty}^\infty g(x)\ln(g(x))\,dx-\lambda_0\left(1-\int_{-\infty}^\infty g(x)\,dx\right)-\lambda\left(\sigma^2-\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math>

−

< math > l = int _ {-infty } ^ infty g (x) ln (g (x)) ，dx-lambda _ 0 left (1-int _ {-infty } ^ infty g (x) ，dx 右)-lambda left (sigma ^ 2-int _ {-infty } ^ infty g (x)(x-mu) ^ 2，dx 右) </math >

* Differential entropy is translation invariant, i.e. for a constant <math>c</math>.<ref name="cover_thomas" />{{rp|253}}

::<math>h(X+c) = h(X)</math>

−

where g(x) is some function with mean μ. When the entropy of g(x) is at a maximum and the constraint equations, which consist of the normalization condition <math>\left(1=\int_{-\infty}^\infty g(x)\,dx\right)</math> and the requirement of fixed variance <math>\left(\sigma^2=\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math>, are both satisfied, then a small variation δg(x) about g(x) will produce a variation δL about L which is equal to zero:

−

其中 g (x)是平均 μ 的函数。当 g (x)的熵处于最大值时，由归一化条件 1=∫∞−∞g(x)dx和固定方差σ2=∫∞−∞g(x)(x−μ)2dx组成的约束方程都满足时，那么关于 g (x)的一个小变化 δg (x)将产生一个等于零的关于L的变化δL:

* Differential entropy is in general not invariant under arbitrary invertible maps.

第223行：第72行：

<math>0=\delta L=\int_{-\infty}^\infty \delta g(x)\left (\ln(g(x))+1+\lambda_0+\lambda(x-\mu)^2\right )\,dx</math>

−

~~0 = delta l = int _ {-infty } ^ infty delta g (x) left (ln (g (x)) + 1 + lambda _ 0 + lambda (x-mu) ^ 2 right) ，dx </math >~~

:::<math>h(aX) = h(X)+ \log |a|</math>

:: For a vector valued random variable <math>\mathbf{X}</math> and an invertible (square) [[matrix (mathematics)|matrix]] <math>\mathbf{A}</math>

−

~~Since this must hold for any small δg(x), the term in brackets must be zero, and solving for g(x) yields:~~

−

~~因为这对任何小的 δg (x)都成立，括号中的项必须为零，求 g (x)的结果是:~~

:::<math>h(\mathbf{A}\mathbf{X})=h(\mathbf{X})+\log \left( |\det \mathbf{A}| \right)</math><ref name="cover_thomas" />{{rp|253}}

第238行：第82行：

<math>g(x)=e^{-\lambda_0-1-\lambda(x-\mu)^2}</math>

−

~~< math > g (x) = e ^ {-lambda _ 0-1-lambda (x-mu) ^ 2} </math >~~

::<math>h(\mathbf{Y}) \leq h(\mathbf{X}) + \int f(x) \log \left\vert \frac{\partial m}{\partial x} \right\vert dx</math>

:where <math>\left\vert \frac{\partial m}{\partial x} \right\vert</math> is the [[Jacobian matrix and determinant|Jacobian]] of the transformation <math>m</math>.<ref>{{cite web |title=proof of upper bound on differential entropy of f(X) |work=[[Stack Exchange]] |date=April 16, 2016 |url=https://math.stackexchange.com/q/1745670 }}</ref> The above inequality becomes an equality if the transform is a bijection. Furthermore, when <math>m</math> is a rigid rotation, translation, or combination thereof, the Jacobian determinant is always 1, and <math>h(Y)=h(X)</math>.

−

~~Using the constraint equations to solve for λ0 and λ yields the normal distribution:~~

−

~~用约束方程求解 λ0和 λ 得到正态分布:~~

* If a random vector <math>X \in \mathbb{R}^n</math> has mean zero and [[covariance]] matrix <math>K</math>, <math>h(\mathbf{X}) \leq \frac{1}{2} \log(\det{2 \pi e K}) = \frac{1}{2} \log[(2\pi e)^n \det{K}]</math> with equality if and only if <math>X</math> is [[Multivariate normal distribution#Joint normality|jointly gaussian]] (see [[#Maximization in the normal distribution|below]]).<ref name="cover_thomas" />{{rp|254}}

−

~~<math>g(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}</math>~~

−

~~< math > g (x) = frac {1}{ sqrt {2 pi sigma ^ 2} e ^ {-frac {(x-mu) ^ 2}{2 sigma ^ 2}} </math >~~

−

~~However, differential entropy does not have other desirable properties:~~

−

~~然而，微分熵并没有期望的性质~~

* It is not invariant under [[change of variables]], and is therefore most useful with dimensionless variables.

它在变量变化下不是不变的，因此对无量纲变量最有用

第268行：第98行：

A modification of differential entropy that addresses these drawbacks is the '''relative information entropy''', also known as the Kullback–Leibler divergence, which includes an [[invariant measure]] factor (see [[limiting density of discrete points]]).

−

~~<math>f(x) = \lambda e^{-\lambda x} \mbox{ for } x \geq 0.</math>~~

−

~~{ for } x geq 0. </math >~~

==Maximization in the normal distribution==

第281行：第105行：

Its differential entropy is then

它的微分熵就会

−

With a [[normal distribution]], differential entropy is maximized for a given variance. A Gaussian random variable has the largest entropy amongst all random variables of equal variance, or, alternatively, the maximum entropy distribution under constraints of mean and variance is the Gaussian.<ref name="cover_thomas" />~~{{rp|255}}~~

+

With a [[normal distribution]], differential entropy is maximized for a given variance. A Gaussian random variable has the largest entropy amongst all random variables of equal variance, or, alternatively, the maximum entropy distribution under constraints of mean and variance is the Gaussian.<ref name="cover_thomas" />

对于正态分布，对于给定的方差，微分熵是最大的。在所有等方差随机变量中，高斯随机变量的熵最大，或者在均值和方差约束下的最大熵分布是高斯分布

−

{|

−

{|

−

|-

−

|-

===Proof===

证明

−

~~| <math>h_e(X)\,</math>~~

−

~~| < math > h _ e (x) ，</math >~~

Let <math>g(x)</math> be a [[Normal distribution|Gaussian]] [[Probability density function|PDF]] with mean μ and variance <math>\sigma^2</math> and <math>f(x)</math> an arbitrary [[Probability density function|PDF]] with the same variance. Since differential entropy is translation invariant we can assume that <math>f(x)</math> has the same mean of <math>\mu</math> as <math>g(x)</math>.

−

~~| <math>=-\int_0^\infty \lambda e^{-\lambda x} \log (\lambda e^{-\lambda x})\,dx</math>~~

−

~~| < math > =-int _ 0 ^ infty lambda e ^ {-lambda x } log (lambda e ^ {-lambda x }) ，dx </math >~~

−

|-

−

|-

Consider the [[Kullback–Leibler divergence]] between the two distributions

−

|

−

|

−

:<math> 0 \leq D_{KL}(f || g) = \int_{-\infty}^\infty f(x) \log \left( \frac{f(x)}{g(x)} \right) dx = -h(f) - \int_{-\infty}^\infty f(x)\log(g(x)) dx.</math>

−

~~| <math>= -\left(\int_0^\infty (\log \lambda)\lambda e^{-\lambda x}\,dx + \int_0^\infty (-\lambda x) \lambda e^{-\lambda x}\,dx\right) </math>~~

−

~~| < math > =-left (int _ 0 ^ infty (log lambda) lambda e ^ {-lambda x } ，dx + int _ 0 ^ infty (- lambda x) lambda e ^ {-lambda x } ，dx right) </math >~~

−

Now note that

−

|-

−

|-

−

:<math>\begin{align}

−

|

−

|

−

\int_{-\infty}^\infty f(x)\log(g(x)) dx &= \int_{-\infty}^\infty f(x)\log\left( \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right) dx \\

−

~~| <math>= -\log \lambda \int_0^\infty f(x)\,dx + \lambda E[X]</math>~~

−

~~| < math > =-log lambda int _ 0 ^ infty f (x) ，dx + lambda e [ x ] </math >~~

−

&= \int_{-\infty}^\infty f(x) \log\frac{1}{\sqrt{2\pi\sigma^2}} dx + \log(e)\int_{-\infty}^\infty f(x)\left( -\frac{(x-\mu)^2}{2\sigma^2}\right) dx \\

−

|-

−

|-

−

&= -\tfrac{1}{2}\log(2\pi\sigma^2) - \log(e)\frac{\sigma^2}{2\sigma^2} \\

−

|

−

|

−

&= -\tfrac{1}{2}\left(\log(2\pi\sigma^2) + \log(e)\right) \\

−

~~| <math>= -\log\lambda + 1\,.</math>~~

−

~~| < math > =-log lambda + 1，. </math >~~

−

&= -\tfrac{1}{2}\log(2\pi e \sigma^2) \\

−

|}

−

|}

−

&= -h(g)

−

\end{align}</math>

−

~~Here, <math>h_e(X)</math> was used rather than <math>h(X)</math> to make it explicit that the logarithm was taken to base e, to simplify the calculation.~~

−

~~在这里，使用he(X)而不是h(X) 来明确对数是以 e 为底，以简化计算。~~

because the result does not depend on <math>f(x)</math> other than through the variance. Combining the two results yields

第379行：第133行：

with equality when <math>f(x)=g(x)</math> following from the properties of Kullback–Leibler divergence.

−

~~The differential entropy yields a lower bound on the expected squared error of an estimator. For any random variable <math>X</math> and estimator <math>\widehat{X}</math> the following holds:~~

−

~~对于估计量的预期平方误差，微分熵产生一个下限。对于任何随机变量x和估计量Xˆ 下面的值:~~

−

~~<math>\operatorname{E}[(X - \widehat{X})^2] \ge \frac{1}{2\pi e}e^{2h(X)}</math>~~

−

~~(x-widehat { x }) ^ 2] ge frac {1}{2 pi e } e ^ {2 h (x)} </math >~~

===Alternative proof===

替代证明

−

~~with equality if and only if <math>X</math> is a Gaussian random variable and <math>\widehat{X}</math> is the mean of <math>X</math>.~~

−

~~当且仅当x是一个 Gaussian 随机变量，而x 是Xˆ 的平均值。~~

This result may also be demonstrated using the [[variational calculus]]. A Lagrangian function with two [[Lagrangian multiplier]]s may be defined as:

−

:<math>L=\int_{-\infty}^\infty g(x)\ln(g(x))\,dx-\lambda_0\left(1-\int_{-\infty}^\infty g(x)\,dx\right)-\lambda\left(\sigma^2-\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math>

−

In the table below <math>\Gamma(x) = \int_0^{\infty} e^{-t} t^{x-1} dt</math> is the gamma function, <math>\psi(x) = \frac{d}{dx} \ln\Gamma(x)=\frac{\Gamma'(x)}{\Gamma(x)}</math> is the digamma function, <math>B(p,q) = \frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}</math> is the beta function, and γE is Euler's constant.<math>- (\beta-1)[\psi(\beta) - \psi(\alpha + \beta)] \, </math>||<math>[0,1]\,</math>

−

在下面的表格中，Gamma (x) = int _ 0 ^ { infty } e ^ {-t } t ^ { x-1} dt </math > 是 Gamma 函数,{ math > psi (x) = frac { d }{ dx } ln Gamma (x) = frac { Gamma’(x)}{ Gamma (x)} </math > 是双伽玛函数,b (p，q) = frac { Gamma (p) Gamma (q)}{ Gamma (p + q)} </math > 是 β 函数，γ e 是欧拉常数。[ math ]-(beta-1)[ psi (beta)-psi (alpha + beta)] | | < math > [0,1] ，</math >

−

|-

−

|-

where ''g(x)'' is some function with mean μ. When the entropy of ''g(x)'' is at a maximum and the constraint equations, which consist of the normalization condition <math>\left(1=\int_{-\infty}^\infty g(x)\,dx\right)</math> and the requirement of fixed variance <math>\left(\sigma^2=\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math>, are both satisfied, then a small variation δ''g''(''x'') about ''g(x)'' will produce a variation δ''L'' about ''L'' which is equal to zero:

−

~~| Cauchy || <math>f(x) = \frac{\gamma}{\pi} \frac{1}{\gamma^2 + x^2}</math> || <math>\ln(4\pi\gamma) \, </math>||<math>(-\infty,\infty)\,</math>~~

−

~~| Cauchy | | < math > f (x) = frac { gamma }{ pi }{ pi ^ 2 + x ^ 2} </math > | < math > ln (4pi gamma) ，</math > | < math > (- infty，infty) ，</math >~~

−

|-

−

|-

:<math>0=\delta L=\int_{-\infty}^\infty \delta g(x)\left (\ln(g(x))+1+\lambda_0+\lambda(x-\mu)^2\right )\,dx</math>

−

| Chi || <math>f(x) = \frac{2}{2^{k/2} \Gamma(k/2)} x^{k-1} \exp\left(-\frac{x^2}{2}\right)</math> || <math>\ln{\frac{\Gamma(k/2)}{\sqrt{2}}} - \frac{k-1}{2} \psi\left(\frac{k}{2}\right) + \frac{k}{2}</math>||<math>[0,\infty)\,</math>

−

| Chi | | < math > f (x) = frac {2}{2 ^ { k/2} Gamma (k/2)}} x ^ { k-1} exp left (- frac { x ^ 2}{2}右) </math > | < math > ln { frac {(k/2)}}}{2}}}-frac {2} psi (frac { k }{2}右) + frac {2} </math > | | math > [0，infty) ，</math >

−

|-

−

|-

Since this must hold for any small δ''g''(''x''), the term in brackets must be zero, and solving for ''g(x)'' yields:

−

| Chi-squared || <math>f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{\frac{k}{2}\!-\!1} \exp\left(-\frac{x}{2}\right)</math> || <math>\ln 2\Gamma\left(\frac{k}{2}\right) - \left(1 - \frac{k}{2}\right)\psi\left(\frac{k}{2}\right) + \frac{k}{2}</math>||<math>[0,\infty)\,</math>

−

| Chi-squared | < math > f (x) = frac {1}{2 ^ { k/2} Gamma (k/2)} x ^ { frac { k }{2} !-! 1} exp left (- frac { x }{2}右) </math > | < math > | < math > ln 2 Gamma left (frac { k }{2}右)-left (1-frac { k }{2}右)左(frac { k }2}右) + c { k {2}{ infmath | < < math > [0，fraty) ，</math >

−

|-

−

|-

:<math>g(x)=e^{-\lambda_0-1-\lambda(x-\mu)^2}</math>

−

~~| Erlang || <math>f(x) = \frac{\lambda^k}{(k-1)!} x^{k-1} \exp(-\lambda x)</math> || <math>(1-k)\psi(k) + \ln \frac{\Gamma(k)}{\lambda} + k</math>||<math>[0,\infty)\,</math>~~

−

| Erlang | | < math > f (x) = frac { lambda ^ k }{(k-1) ! }X ^ { k-1} exp (- lambda x) </math > | < math > (1-k) psi (k) + ln frac { Gamma (k)}{ lambda } + k </math > | < math > [0，infty ] ，</math >

−

|-

−

|-

Using the constraint equations to solve for λ0 and λ yields the normal distribution:

−

| F || <math>f(x) = \frac{n_1^{\frac{n_1}{2}} n_2^{\frac{n_2}{2}}}{B(\frac{n_1}{2},\frac{n_2}{2})} \frac{x^{\frac{n_1}{2} - 1}}{(n_2 + n_1 x)^{\frac{n_1 + n2}{2}}}</math> || <math>\ln \frac{n_1}{n_2} B\left(\frac{n_1}{2},\frac{n_2}{2}\right) + \left(1 - \frac{n_1}{2}\right) \psi\left(\frac{n_1}{2}\right) -</math> <math>\left(1 + \frac{n_2}{2}\right)\psi\left(\frac{n_2}{2}\right) + \frac{n_1 + n_2}{2} \psi\left(\frac{n_1\!+\!n_2}{2}\right)</math>||<math>[0,\infty)\,</math>

−

我们会找到你的| | < math > f (x) = frac{ n _ 1 ^ { frac { n _ 1}{2}{ frac { n _ 2}{2}}{ b (frac { n _ 1}{2} ，frac { n _ 2}{2}}}}}} frac { x ^ { frac { n _ 1}{2}-1}{(n _ 2 + n _ 1 x) ^ { frac { n _ 1 + n _ 2}{2}}{2}{2}} </} </math > | | | (frac { n _ 1}{ n _ 2} b left (frac { n _ 1}{2} ,2}{2}{2}{2}{2}{2}{1}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{1}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2} psi 左(frac { n _ 1！+\！[0，infty) ，</math > | < math >

−

|-

−

|-

:<math>g(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}</math>

−

~~| Gamma || <math>f(x) = \frac{x^{k - 1} \exp(-\frac{x}{\theta})}{\theta^k \Gamma(k)}</math> || <math>\ln(\theta \Gamma(k)) + (1 - k)\psi(k) + k \, </math>||<math>[0,\infty)\,</math>~~

−

| Gamma | | < math > f (x) = frac { x ^ { k-1} exp (- frac { x }{ theta })}{ theta ^ k Gamma (k)} </math > | < math > ln (theta Gamma (k)) + (1-k) psi (k) + k，</math > | < math > [0，infty) ，</math >

−

|-

−

|-

==Example: Exponential distribution==

例子：指数分布

−

~~| Laplace || <math>f(x) = \frac{1}{2b} \exp\left(-\frac{|x - \mu|}{b}\right)</math> || <math>1 + \ln(2b) \, </math>||<math>(-\infty,\infty)\,</math>~~

−

~~| Laplace | | < math > f (x) = frac {1}{2b } exp left (- frac { | x-mu | }{ b } right) </math > | < math > 1 + ln (2b) ，</math > | < math > (- infty，infty) ，</math >~~

−

Let <math>X</math> be an [[exponential distribution|exponentially distributed]] random variable with parameter <math>\lambda</math>, that is, with probability density function

−

|-

−

|-

−

~~| Logistic || <math>f(x) = \frac{e^{-x}}{(1 + e^{-x})^2}</math> || <math>2 \, </math>||<math>(-\infty,\infty)\,</math>~~

−

~~| Logistic | | < math > f (x) = frac { e ^ {-x }{(1 + e ^ {-x }) ^ 2} </math > | < math > 2，</math > | < math > (- infty，infty) ，</math >~~

:<math>f(x) = \lambda e^{-\lambda x} \mbox{ for } x \geq 0.</math>

−

|-

−

|-

−

~~| Lognormal || <math>f(x) = \frac{1}{\sigma x \sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right)</math> || <math>\mu + \frac{1}{2} \ln(2\pi e \sigma^2)</math>||<math>[0,\infty)\,</math>~~

−

| Lognormal | < math > f (x) = frac {1}{ sigma x sqrt {2 pi } exp left (- frac {(ln x-mu) ^ 2}{2 sigma ^ 2} right) </math > | < math > mu + frac {1}{2} ln (2 pi e sigma ^ 2) </math > | < math > [0，infty) ，</math >

Its differential entropy is then

−

|-

−

|-

−

{|

−

| Maxwell–Boltzmann || <math>f(x) = \frac{1}{a^3}\sqrt{\frac{2}{\pi}}\,x^{2}\exp\left(-\frac{x^2}{2a^2}\right)</math> || <math>\ln(a\sqrt{2\pi})+\gamma_E-\frac{1}{2}</math>||<math>[0,\infty)\,</math>

−

| Maxwell-Boltzmann | | < math > f (x) = frac {1}{ a ^ 3}{ frac {2}{ pi } ，x ^ {2} exp left (- frac { x ^ 2}{2a ^ 2}右) </math > | < math > ln (a sqrt {2 pi }) + e-frac {1} </math > | | math < 0，infty) ，</math >

−

|-

−

|-

−

|-

−

| <math>h_e(X)\,</math>

−

| Generalized normal || <math>f(x) = \frac{2 \beta^{\frac{\alpha}{2}}}{\Gamma(\frac{\alpha}{2})} x^{\alpha - 1} \exp(-\beta x^2)</math> || <math>\ln{\frac{\Gamma(\alpha/2)}{2\beta^{\frac{1}{2}}}} - \frac{\alpha - 1}{2} \psi\left(\frac{\alpha}{2}\right) + \frac{\alpha}{2}</math>||<math>(-\infty,\infty)\,</math>

−

| 广义正态| | < math > f (x) = frac{2 beta ^ { frac { alpha }{2}{ Gamma (frac { alpha }{2})} x ^ { alpha-1} exp (- beta x ^ 2) </math > | | < math > ln { frac { Gamma (alpha/2)}{2 beta ^ { frac {1}{2}}}}}-frac { alpha-1}{2} psi left (frac { alpha }{2} right) + frac { alpha }{2}}{2}| | < math > (- infty，infty) ，</math >

−

| <math>=-\int_0^\infty \lambda e^{-\lambda x} \log (\lambda e^{-\lambda x})\,dx</math>

−

|-

−

|-

−

|-

−

~~| Pareto || <math>f(x) = \frac{\alpha x_m^\alpha}{x^{\alpha+1}}</math> || <math>\ln \frac{x_m}{\alpha} + 1 + \frac{1}{\alpha}</math>||<math>[x_m,\infty)\,</math>~~

−

~~| Pareto | < math > f (x) = frac { alpha x _ m ^ alpha }{ x ^ { alpha + 1}} </math > | < math > ln frac { x _ m }{ alpha } + 1 + frac {1}{ alpha } </math > | < math > [ x _ m，infty ] ，</math >~~

−

|

−

|-

−

|-

−

| <math>= -\left(\int_0^\infty (\log \lambda)\lambda e^{-\lambda x}\,dx + \int_0^\infty (-\lambda x) \lambda e^{-\lambda x}\,dx\right) </math>

−

| Student's t || <math>f(x) = \frac{(1 + x^2/\nu)^{-\frac{\nu+1}{2}}}{\sqrt{\nu}B(\frac{1}{2},\frac{\nu}{2})}</math> || <math>\frac{\nu\!+\!1}{2}\left(\psi\left(\frac{\nu\!+\!1}{2}\right)\!-\!\psi\left(\frac{\nu}{2}\right)\right)\!+\!\ln \sqrt{\nu} B\left(\frac{1}{2},\frac{\nu}{2}\right)</math>||<math>(-\infty,\infty)\,</math>

−

| Student’s t | < math > f (x) = frac {(1 + x ^ 2/nu) ^ {-frac { nu + 1}{2}}{{ sqrt { nu } b (frac {1}{2} ，frac { nu }{2})} </math | | | < math > frac { nu! + ! 1}{2}右) !-! 左(psi (frac { nu! + 1}{2}右) !-! 左(frac { nu }{2右) ! + ! { nu }{ n 左(frac {2}右)

−

|-

−

|-

−

|-

−

|

−

~~| Triangular || <math> f(x) = \begin{cases}~~

−

~~| 三角形 | | < math > f (x) = begin { cases }~~

−

| <math>= -\log \lambda \int_0^\infty f(x)\,dx + \lambda E[X]</math>

−

~~\frac{2(x-a)}{(b-a)(c-a)} & \mathrm{for\ } a \le x \leq c, \\[4pt]~~

−

~~Frac {2(x-a)}{(b-a)(c-a)} & mathrm { for } a le x leq c，[4 pt ]~~

−

|-

−

~~\frac{2(b-x)}{(b-a)(b-c)} & \mathrm{for\ } c < x \le b, \\[4pt]~~

−

~~Frac {2(b-x)}{(b-a)(b-c)} & mathrm { for } c < x le b，[4 pt ]~~

−

|

−

~~\end{cases}</math> || <math>\frac{1}{2} + \ln \frac{b-a}{2}</math>||<math>[0,1]\,</math>~~

−

~~结束{ cases } </math > | | < math > frac {1}{2} + ln frac { b-a }{2} </math > | < math > [0,1] ，</math >~~

−

| <math>= -\log\lambda + 1\,.</math>

−

|-

−

|-

−

|}

−

~~| Weibull || <math>f(x) = \frac{k}{\lambda^k} x^{k-1} \exp\left(-\frac{x^k}{\lambda^k}\right)</math> || <math>\frac{(k-1)\gamma_E}{k} + \ln \frac{\lambda}{k} + 1</math>||<math>[0,\infty)\,</math>~~

−

| Weibull | | < math > f (x) = frac { k }{ lambda ^ k } x ^ { k-1} exp left (- frac { x ^ k }{ lambda ^ k } right) </math > | < math > | < math > frac {(k-1) gamma _ e }{ k } + ln frac { lambda }{ k } + 1 </math > | < math > [0，infty) ，</math >

−

|-

−

|-

Here, <math>h_e(X)</math> was used rather than <math>h(X)</math> to make it explicit that the logarithm was taken to base ''e'', to simplify the calculation.

−

~~| Multivariate normal || <math>~~

−

~~多元正态 | | < 数学 >~~

−

f_X(\vec{x}) =</math> <math> \frac{\exp \left( -\frac{1}{2} ( \vec{x} - \vec{\mu})^\top \Sigma^{-1}\cdot(\vec{x} - \vec{\mu}) \right)} {(2\pi)^{N/2} \left|\Sigma\right|^{1/2}}</math> || <math>\frac{1}{2}\ln\{(2\pi e)^{N} \det(\Sigma)\}</math>||<math>\mathbb{R}^N</math>

−

F _ x (vec { x }) = </math > < math > frac { exp left (- frac {1}{2}(vec { x }-vec { mu }) ^ top Sigma ^ {-1} cdot (vec { x }-vec { mu }) right)}{(2 pi) ^ { N/2}左 Sigma | right | ^ {1/2} < | < math > | < < | < math > frac {1}{ ln (2 pi e){{ n } | math < | | | | > 数学 < bb >

==Relation to estimator error==

−

~~与估计量误差的联系~~

−

|-

−

|-

−

The differential entropy yields a lower bound on the expected squared error of an [[estimator]]. For any random variable <math>X</math> and estimator <math>\widehat{X}</math> the following holds:<ref name="cover_thomas" />

−

|}

−

|}

−

:<math>\operatorname{E}[(X - \widehat{X})^2] \ge \frac{1}{2\pi e}e^{2h(X)}</math>

−

with equality if and only if <math>X</math> is a Gaussian random variable and <math>\widehat{X}</math> is the mean of <math>X</math>.

−

~~Many of the differential entropies are from.~~

−

~~许多熵的差异来自于。~~

−

==Differential entropies for various distributions==

−

~~<math>H_h=-\sum_i hf(ih)\log (f(ih)) - \sum hf(ih)\log(h).</math>~~

+

In the table below <math>\Gamma(x) = \int_0^{\infty} e^{-t} t^{x-1} dt</math> is the [[gamma function]], <math>\psi(x) = \frac{d}{dx} \ln\Gamma(x)=\frac{\Gamma'(x)}{\Gamma(x)}</math> is the [[digamma function]], <math>B(p,q) = \frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}</math> is the [[beta function]], and γ''E'' is [[Euler-Mascheroni constant|Euler's constant]].<ref>{{cite journal |last1=Park |first1=Sung Y. |last2=Bera |first2=Anil K. |year=2009 |title=Maximum entropy autoregressive conditional heteroskedasticity model |journal=Journal of Econometrics |publisher=Elsevier |url=http://www.wise.xmu.edu.cn/Master/Download/..%5C..%5CUploadFiles%5Cpaper-masterdownload%5C2009519932327055475115776.pdf |access-date=2011-06-02 |archive-url=https://web.archive.org/web/20160307144515/http://wise.xmu.edu.cn/uploadfiles/paper-masterdownload/2009519932327055475115776.pdf |archive-date=2016-03-07 |url-status=dead }}</ref>{{rp|219–230}}

−

~~[数学] h =-sum _ i hf (ih) log (f (ih)-sum hf (ih) log (h)~~

−

In the table below <math>\Gamma(x) = \int_0^{\infty} e^{-t} t^{x-1} dt</math> is the [[gamma function]], <math>\psi(x) = \frac{d}{dx} \ln\Gamma(x)=\frac{\Gamma'(x)}{\Gamma(x)}</math> is the [[digamma function]], <math>B(p,q) = \frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}</math> is the [[beta function]], and γ''E'' is [[Euler-Mascheroni constant|Euler's constant]].<ref>{{cite journal |last1=Park |first1=Sung Y. |last2=Bera |first2=Anil K. |year=2009 |title=Maximum entropy autoregressive conditional heteroskedasticity model |journal=Journal of Econometrics |publisher=Elsevier |url=http://www.wise.xmu.edu.cn/Master/Download/..%5C..%5CUploadFiles%5Cpaper-masterdownload%5C2009519932327055475115776.pdf |~~accessdate~~=2011-06-02 |archive-url=https://web.archive.org/web/20160307144515/http://wise.xmu.edu.cn/uploadfiles/paper-masterdownload/2009519932327055475115776.pdf |archive-date=2016-03-07 |url-status=dead }}</ref>{{rp|219–230}}

−

{| class="wikitable" style="background:white"

−

The first term on the right approximates the differential entropy, while the second term is approximately <math>-\log(h)</math>. Note that this procedure suggests that the entropy in the discrete sense of a continuous random variable should be <math>\infty</math>.

−

~~右边的第一个术语近似于微分熵，而第二个术语近似于log(h)。请注意，这个过程表明，连续随机变量的离散意义上的熵应该是“无穷”。~~

−

|+ Table of differential entropies

−

|-

−

! Distribution Name !! Probability density function (pdf) !! Entropy in [[Nat (unit)|nat]]s || Support

−

|-

−

| [[Uniform distribution (continuous)|Uniform]] || <math>f(x) = \frac{1}{b-a}</math> || <math>\ln(b - a) \,</math> ||<math>[a,b]\,</math>

−

|-

−

| [[Normal distribution|Normal]] || <math>f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)</math> || <math>\ln\left(\sigma\sqrt{2\,\pi\,e}\right) </math>||<math>(-\infty,\infty)\,</math>

−

|-

−

| [[Exponential distribution|Exponential]] || <math>f(x) = \lambda \exp\left(-\lambda x\right)</math> || <math>1 - \ln \lambda \, </math>||<math>[0,\infty)\,</math>

−

|-

−

| [[Rayleigh distribution|Rayleigh]] || <math>f(x) = \frac{x}{\sigma^2} \exp\left(-\frac{x^2}{2\sigma^2}\right)</math> || <math>1 + \ln \frac{\sigma}{\sqrt{2}} + \frac{\gamma_E}{2}</math>||<math>[0,\infty)\,</math>

−

|-

−

| [[Beta distribution|Beta]] || <math>f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}</math> for <math>0 \leq x \leq 1</math> || <math> \ln B(\alpha,\beta) - (\alpha-1)[\psi(\alpha) - \psi(\alpha +\beta)]\,</math> <math>- (\beta-1)[\psi(\beta) - \psi(\alpha + \beta)] \, </math>||<math>[0,1]\,</math>

−

|-

−

~~Category:Entropy and information~~

−

~~类别: 熵和信息~~

−

| [[Cauchy distribution|Cauchy]] || <math>f(x) = \frac{\gamma}{\pi} \frac{1}{\gamma^2 + x^2}</math> || <math>\ln(4\pi\gamma) \, </math>||<math>(-\infty,\infty)\,</math>

+

|-

+

| [[Chi distribution|Chi]] || <math>f(x) = \frac{2}{2^{k/2} \Gamma(k/2)} x^{k-1} \exp\left(-\frac{x^2}{2}\right)</math> || <math>\ln{\frac{\Gamma(k/2)}{\sqrt{2}}} - \frac{k-1}{2} \psi\left(\frac{k}{2}\right) + \frac{k}{2}</math>||<math>[0,\infty)\,</math>

+

|-

+

| [[Chi-squared distribution|Chi-squared]] || <math>f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{\frac{k}{2}\!-\!1} \exp\left(-\frac{x}{2}\right)</math> || <math>\ln 2\Gamma\left(\frac{k}{2}\right) - \left(1 - \frac{k}{2}\right)\psi\left(\frac{k}{2}\right) + \frac{k}{2}</math>||<math>[0,\infty)\,</math>

+

|-

+

| [[Erlang distribution|Erlang]] || <math>f(x) = \frac{\lambda^k}{(k-1)!} x^{k-1} \exp(-\lambda x)</math> || <math>(1-k)\psi(k) + \ln \frac{\Gamma(k)}{\lambda} + k</math>||<math>[0,\infty)\,</math>

+

|-

+

| [[F distribution|F]] || <math>f(x) = \frac{n_1^{\frac{n_1}{2}} n_2^{\frac{n_2}{2}}}{B(\frac{n_1}{2},\frac{n_2}{2})} \frac{x^{\frac{n_1}{2} - 1}}{(n_2 + n_1 x)^{\frac{n_1 + n2}{2}}}</math> || <math>\ln \frac{n_1}{n_2} B\left(\frac{n_1}{2},\frac{n_2}{2}\right) + \left(1 - \frac{n_1}{2}\right) \psi\left(\frac{n_1}{2}\right) -</math> <math>\left(1 + \frac{n_2}{2}\right)\psi\left(\frac{n_2}{2}\right) + \frac{n_1 + n_2}{2} \psi\left(\frac{n_1\!+\!n_2}{2}\right)</math>||<math>[0,\infty)\,</math>

+

|-

+

| [[Gamma distribution|Gamma]] || <math>f(x) = \frac{x^{k - 1} \exp(-\frac{x}{\theta})}{\theta^k \Gamma(k)}</math> || <math>\ln(\theta \Gamma(k)) + (1 - k)\psi(k) + k \, </math>||<math>[0,\infty)\,</math>

+

|-

+

| [[Laplace distribution|Laplace]] || <math>f(x) = \frac{1}{2b} \exp\left(-\frac{|x - \mu|}{b}\right)</math> || <math>1 + \ln(2b) \, </math>||<math>(-\infty,\infty)\,</math>

+

|-

+

| [[Logistic distribution|Logistic]] || <math>f(x) = \frac{e^{-x}}{(1 + e^{-x})^2}</math> || <math>2 \, </math>||<math>(-\infty,\infty)\,</math>

+

|-

+

| [[Log-normal distribution|Lognormal]] || <math>f(x) = \frac{1}{\sigma x \sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right)</math> || <math>\mu + \frac{1}{2} \ln(2\pi e \sigma^2)</math>||<math>[0,\infty)\,</math>

+

|-

+

| [[Maxwell–Boltzmann distribution|Maxwell–Boltzmann]] || <math>f(x) = \frac{1}{a^3}\sqrt{\frac{2}{\pi}}\,x^{2}\exp\left(-\frac{x^2}{2a^2}\right)</math> || <math>\ln(a\sqrt{2\pi})+\gamma_E-\frac{1}{2}</math>||<math>[0,\infty)\,</math>

+

|-

+

| [[Generalized Gaussian distribution|Generalized normal]] || <math>f(x) = \frac{2 \beta^{\frac{\alpha}{2}}}{\Gamma(\frac{\alpha}{2})} x^{\alpha - 1} \exp(-\beta x^2)</math> || <math>\ln{\frac{\Gamma(\alpha/2)}{2\beta^{\frac{1}{2}}}} - \frac{\alpha - 1}{2} \psi\left(\frac{\alpha}{2}\right) + \frac{\alpha}{2}</math>||<math>(-\infty,\infty)\,</math>

+

|-

+

| [[Pareto distribution|Pareto]] || <math>f(x) = \frac{\alpha x_m^\alpha}{x^{\alpha+1}}</math> || <math>\ln \frac{x_m}{\alpha} + 1 + \frac{1}{\alpha}</math>||<math>[x_m,\infty)\,</math>

+

|-

+

| [[Student's t-distribution|Student's t]] || <math>f(x) = \frac{(1 + x^2/\nu)^{-\frac{\nu+1}{2}}}{\sqrt{\nu}B(\frac{1}{2},\frac{\nu}{2})}</math> || <math>\frac{\nu\!+\!1}{2}\left(\psi\left(\frac{\nu\!+\!1}{2}\right)\!-\!\psi\left(\frac{\nu}{2}\right)\right)\!+\!\ln \sqrt{\nu} B\left(\frac{1}{2},\frac{\nu}{2}\right)</math>||<math>(-\infty,\infty)\,</math>

+

|-

+

| [[Triangular distribution|Triangular]] || <math> f(x) = \begin{cases}

+

\frac{2(x-a)}{(b-a)(c-a)} & \mathrm{for\ } a \le x \leq c, \\[4pt]

+

\frac{2(b-x)}{(b-a)(b-c)} & \mathrm{for\ } c < x \le b, \\[4pt]

+

\end{cases}</math> || <math>\frac{1}{2} + \ln \frac{b-a}{2}</math>||<math>[0,1]\,</math>

+

|-

+

| [[Weibull distribution|Weibull]] || <math>f(x) = \frac{k}{\lambda^k} x^{k-1} \exp\left(-\frac{x^k}{\lambda^k}\right)</math> || <math>\frac{(k-1)\gamma_E}{k} + \ln \frac{\lambda}{k} + 1</math>||<math>[0,\infty)\,</math>

+

|-

+

| [[Multivariate normal distribution|Multivariate normal]] || <math>

+

f_X(\vec{x}) =</math> <math> \frac{\exp \left( -\frac{1}{2} ( \vec{x} - \vec{\mu})^\top \Sigma^{-1}\cdot(\vec{x} - \vec{\mu}) \right)} {(2\pi)^{N/2} \left|\Sigma\right|^{1/2}}</math> || <math>\frac{1}{2}\ln\{(2\pi e)^{N} \det(\Sigma)\}</math>||<math>\mathbb{R}^N</math>

+

|-

+

|}

−

~~Category:~~Information ~~theory~~

+

Many of the differential entropies are from.<ref name="lazorathie">{{cite journal|author=Lazo, A. and P. Rathie|title=On the entropy of continuous probability distributions|journal=IEEE Transactions on Information Theory|year=1978|volume=24 |issue=1|doi=10.1109/TIT.1978.1055832|pages=120–122}}</ref>{{rp|120–122}}

−

~~范畴: 信息论~~

−

|-

−

~~Category:Statistical randomness~~

−

~~分类: 统计的随机性~~

−

~~<noinclude>~~

+

---------

−

~~This page was moved from~~ [[~~wikipedia~~:~~en:Differential entropy]]. Its edit history can be viewed at [[微分熵/edithistory~~]]~~</noinclude>~~

+

[[Category:熵和信息]]

−

[[Category:~~待整理页面~~]]

+

[[Category:信息论]]

+

[[Category:统计的随机性]]

薄荷

7,129

个编辑

更改

微分熵 (查看源代码)

2021年1月29日 (五) 07:04的版本

导航菜单

搜索