第1行: |
第1行: |
| 此词条暂由Henry翻译。 | | 此词条暂由Henry翻译。 |
| 由CecileLi初步审校。 | | 由CecileLi初步审校。 |
− |
| |
− | {{Short description|Concept in information theory}}
| |
− |
| |
− | {{Information theory}}
| |
− |
| |
| | | |
| | | |
第13行: |
第8行: |
| | | |
| <font color="#ff8000"> 微分熵Differential entropy</font>(也被称为连续熵)是信息论中的一个概念,其来源于香农尝试将他的香农熵的概念扩展到连续的概率分布。香农熵是衡量一个随机变量的平均惊异程度的指标。可惜的是,香农只是假设它是离散熵的正确连续模拟而并没有推导出公式,但事实上它并不是离散熵的正确连续模拟。 | | <font color="#ff8000"> 微分熵Differential entropy</font>(也被称为连续熵)是信息论中的一个概念,其来源于香农尝试将他的香农熵的概念扩展到连续的概率分布。香农熵是衡量一个随机变量的平均惊异程度的指标。可惜的是,香农只是假设它是离散熵的正确连续模拟而并没有推导出公式,但事实上它并不是离散熵的正确连续模拟。 |
− |
| |
− |
| |
| | | |
| <math>h(X_1, \ldots, X_n) = \sum_{i=1}^{n} h(X_i|X_1, \ldots, X_{i-1}) \leq \sum_{i=1}^{n} h(X_i)</math>. | | <math>h(X_1, \ldots, X_n) = \sum_{i=1}^{n} h(X_i|X_1, \ldots, X_{i-1}) \leq \sum_{i=1}^{n} h(X_i)</math>. |
| | | |
− | < math > h (x _ 1,ldots,xn) = sum _ { i = 1} ^ { n } h (x _ i | x _ 1,ldots,x _ { i-1}) leq sum _ { i = 1} ^ { n } h (x _ i) </math > .
| |
| | | |
| ==Definition== | | ==Definition== |
| 定义 | | 定义 |
− | Let <math>X</math> be a random variable with a [[probability density function]] <math>f</math> whose [[support (mathematics)|support]] is a set <math>\mathcal X</math>. The ''differential entropy'' <math>h(X)</math> or <math>h(f)</math> is defined as<ref name="cover_thomas">{{cite book|first1=Thomas M.|first2=Joy A.|last1=Cover|last2=Thomas|isbn=0-471-06259-6|title=Elements of Information Theory|year=1991|publisher=Wiley|location=New York|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref>{{rp|243}} | + | Let <math>X</math> be a random variable with a [[probability density function]] <math>f</math> whose [[support (mathematics)|support]] is a set <math>\mathcal X</math>. The ''differential entropy'' <math>h(X)</math> or <math>h(f)</math> is defined as<ref name="cover_thomas">{{cite book|first1=Thomas M.|first2=Joy A.|last1=Cover|last2=Thomas|isbn=0-471-06259-6|title=Elements of Information Theory|year=1991|publisher=Wiley|location=New York|url=https://archive.org/details/elementsofinform0000cove|url-access=registration}}</ref> |
− | | |
| <math>h(X+c) = h(X)</math> | | <math>h(X+c) = h(X)</math> |
| | | |
− | [ math > h (x + c) = h (x) </math >
| |
| | | |
| --[[用户:CecileLi|CecileLi]]([[用户讨论:CecileLi|讨论]]) 【审校】此处缺无格式的英文及翻译 补充:设随机变量X,其概率密度函数F的的定义域是X的集合 | | --[[用户:CecileLi|CecileLi]]([[用户讨论:CecileLi|讨论]]) 【审校】此处缺无格式的英文及翻译 补充:设随机变量X,其概率密度函数F的的定义域是X的集合 |
| | | |
− | {{Equation box 1
| + | :<math>h(X) = -\int_\mathcal{X} f(x)\log f(x)\,dx</math> |
− | | |
− | In particular, for a constant <math>a</math>
| |
− | | |
− | 特别地,对于一个常量
| |
− | | |
− | |indent =
| |
− | | |
− | <math>h(aX) = h(X)+ \log |a|</math>
| |
− | | |
− | H (aX) = h (x) + log | a | </math
| |
− | | |
− | |title=
| |
− | | |
− | For a vector valued random variable <math>\mathbf{X}</math> and an invertible (square) matrix <math>\mathbf{A}</math>
| |
− | | |
− | 对于向量值随机变量 < math > mathbf { x } </math > 和可逆矩阵 < math > mathbf { a } </math >
| |
− | | |
− | |equation = <math>h(X) = -\int_\mathcal{X} f(x)\log f(x)\,dx</math>
| |
− | | |
− | <math>h(\mathbf{A}\mathbf{X})=h(\mathbf{X})+\log \left( |\det \mathbf{A}| \right)</math>
| |
− | | |
− | < math > h (mathbf { a } mathbf { x }) = h (mathbf { x }) + log left (| det mathbf { a } | right) </math >
| |
− | | |
− | |cellpadding= 6
| |
− | | |
− | |border
| |
− | | |
− | <math>h(\mathbf{Y}) \leq h(\mathbf{X}) + \int f(x) \log \left\vert \frac{\partial m}{\partial x} \right\vert dx</math>
| |
− | | |
− | [ math > h (mathbf { y }) leq h (mathbf { x }) + int f (x) log left vert frac { partial m }{ partial x } right vert dx </math >
| |
− | | |
− | |border colour = #0073CF
| |
− | | |
− | where <math>\left\vert \frac{\partial m}{\partial x} \right\vert</math> is the Jacobian of the transformation <math>m</math>.
| |
− | | |
− | 其中“ math” > “ left vert”{ partial m }{ partial x }“ right vert” >/math > 是变换的雅可比矩阵。
| |
− | | |
− | |background colour=#F5FFFA}}
| |
− | | |
− | | |
− | However, differential entropy does not have other desirable properties:
| |
− | | |
− | 然而,微分熵并没有其他令人满意的特性:
| |
| | | |
| For probability distributions which don't have an explicit density function expression, but have an explicit [[quantile function]] expression, <math>Q(p)</math>, then <math>h(Q)</math> can be defined in terms of the derivative of <math>Q(p)</math> i.e. the quantile density function <math>Q'(p)</math> as <ref>{{Citation |last1=Vasicek |first1=Oldrich |year=1976 |title=A Test for Normality Based on Sample Entropy |journal=[[Journal of the Royal Statistical Society, Series B]] |volume=38 |issue=1 |jstor=2984828 |postscript=. }}</ref>{{rp|54–59}} | | For probability distributions which don't have an explicit density function expression, but have an explicit [[quantile function]] expression, <math>Q(p)</math>, then <math>h(Q)</math> can be defined in terms of the derivative of <math>Q(p)</math> i.e. the quantile density function <math>Q'(p)</math> as <ref>{{Citation |last1=Vasicek |first1=Oldrich |year=1976 |title=A Test for Normality Based on Sample Entropy |journal=[[Journal of the Royal Statistical Society, Series B]] |volume=38 |issue=1 |jstor=2984828 |postscript=. }}</ref>{{rp|54–59}} |
第79行: |
第26行: |
| --[[用户:CecileLi|CecileLi]]([[用户讨论:CecileLi|讨论]]) 【审校】此处缺无格式的英文及翻译 补充:For probability distributions which don't have an explicit density function expression, but have an explicit quantile function expression, , then can be defined in terms of the derivative of i.e. the quantile density function as | | --[[用户:CecileLi|CecileLi]]([[用户讨论:CecileLi|讨论]]) 【审校】此处缺无格式的英文及翻译 补充:For probability distributions which don't have an explicit density function expression, but have an explicit quantile function expression, , then can be defined in terms of the derivative of i.e. the quantile density function as |
| 对于没有显式密度函数表达式,但有显式分位数函数表达式的概率分布,我们则可以用分位数密度函数的导数来定义,即 | | 对于没有显式密度函数表达式,但有显式分位数函数表达式的概率分布,我们则可以用分位数密度函数的导数来定义,即 |
| + | |
| :<math>h(Q) = \int_0^1 \log Q'(p)\,dp</math>. | | :<math>h(Q) = \int_0^1 \log Q'(p)\,dp</math>. |
| | | |
第88行: |
第36行: |
| | | |
| As with its discrete analog, the units of differential entropy depend on the base of the [[logarithm]], which is usually 2 (i.e., the units are [[bit]]s). See [[logarithmic units]] for logarithms taken in different bases. Related concepts such as [[joint entropy|joint]], [[conditional entropy|conditional]] differential entropy, and [[Kullback–Leibler divergence|relative entropy]] are defined in a similar fashion. Unlike the discrete analog, the differential entropy has an offset that depends on the units used to measure <math>X</math>.<ref name="gibbs">{{cite book |last=Gibbs |first=Josiah Willard |authorlink=Josiah Willard Gibbs |title=[[Elementary Principles in Statistical Mechanics|Elementary Principles in Statistical Mechanics, developed with especial reference to the rational foundation of thermodynamics]] |year=1902 |publisher=Charles Scribner's Sons |location=New York}}</ref>{{rp|183–184}} For example, the differential entropy of a quantity measured in millimeters will be {{not a typo|log(1000)}} more than the same quantity measured in meters; a dimensionless quantity will have differential entropy of {{not a typo|log(1000)}} more than the same quantity divided by 1000. | | As with its discrete analog, the units of differential entropy depend on the base of the [[logarithm]], which is usually 2 (i.e., the units are [[bit]]s). See [[logarithmic units]] for logarithms taken in different bases. Related concepts such as [[joint entropy|joint]], [[conditional entropy|conditional]] differential entropy, and [[Kullback–Leibler divergence|relative entropy]] are defined in a similar fashion. Unlike the discrete analog, the differential entropy has an offset that depends on the units used to measure <math>X</math>.<ref name="gibbs">{{cite book |last=Gibbs |first=Josiah Willard |authorlink=Josiah Willard Gibbs |title=[[Elementary Principles in Statistical Mechanics|Elementary Principles in Statistical Mechanics, developed with especial reference to the rational foundation of thermodynamics]] |year=1902 |publisher=Charles Scribner's Sons |location=New York}}</ref>{{rp|183–184}} For example, the differential entropy of a quantity measured in millimeters will be {{not a typo|log(1000)}} more than the same quantity measured in meters; a dimensionless quantity will have differential entropy of {{not a typo|log(1000)}} more than the same quantity divided by 1000. |
− |
| |
| | | |
| | | |
| One must take care in trying to apply properties of discrete entropy to differential entropy, since probability density functions can be greater than 1. For example, the [[Uniform distribution (continuous)|uniform distribution]] <math>\mathcal{U}(0,1/2)</math> has ''negative'' differential entropy | | One must take care in trying to apply properties of discrete entropy to differential entropy, since probability density functions can be greater than 1. For example, the [[Uniform distribution (continuous)|uniform distribution]] <math>\mathcal{U}(0,1/2)</math> has ''negative'' differential entropy |
− |
| |
− | With a normal distribution, differential entropy is maximized for a given variance. A Gaussian random variable has the largest entropy amongst all random variables of equal variance, or, alternatively, the maximum entropy distribution under constraints of mean and variance is the Gaussian.
| |
− |
| |
− | 在一个正态分布下,对于给定的方差,微分熵是最大的。在所有方差相等的随机变量中,高斯型随机变量的熵最大,或者说在均值和方差约束下的最大熵分布是高斯型随机变量。
| |
− |
| |
| | | |
| | | |
| :<math>\int_0^\frac{1}{2} -2\log(2)\,dx=-\log(2)\,</math>. | | :<math>\int_0^\frac{1}{2} -2\log(2)\,dx=-\log(2)\,</math>. |
| | | |
− |
| |
− |
| |
− | Let <math>g(x)</math> be a Gaussian PDF with mean μ and variance <math>\sigma^2</math> and <math>f(x)</math> an arbitrary PDF with the same variance. Since differential entropy is translation invariant we can assume that <math>f(x)</math> has the same mean of <math>\mu</math> as <math>g(x)</math>.
| |
− |
| |
− | 设g(x) 是一个高斯分布的 PDF,平均值μ 和方差σ2和f(x)一个任意的 PDF,方差相同。由于微分熵是平移不变的,我们可以假设 f(x) 与g(x)具有相同的平均值。
| |
| | | |
| Thus, differential entropy does not share all properties of discrete entropy. | | Thus, differential entropy does not share all properties of discrete entropy. |
| | | |
| | | |
| + | Note that the continuous [[mutual information]] <math>I(X;Y)</math> has the distinction of retaining its fundamental significance as a measure of discrete information since it is actually the limit of the discrete mutual information of ''partitions'' of <math>X</math> and <math>Y</math> as these partitions become finer and finer. Thus it is invariant under non-linear [[homeomorphisms]] (continuous and uniquely invertible maps), <ref>{{cite journal | first = Alexander | last = Kraskov |author2=Stögbauer, Grassberger | year = 2004 | title = Estimating mutual information | journal = [[Physical Review E]] | volume = 60 | pages = 066138 | doi =10.1103/PhysRevE.69.066138|arxiv = cond-mat/0305641 |bibcode = 2004PhRvE..69f6138K }}</ref> including linear <ref name = Reza>{{ cite book | title = An Introduction to Information Theory | author = Fazlollah M. Reza | publisher = Dover Publications, Inc., New York | origyear = 1961| year = 1994 | isbn = 0-486-68210-2 | url = https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22&as_brr=0&ei=zP79Ro7UBovqoQK4g_nCCw&sig=j3lPgyYrC3-bvn1Td42TZgTzj0Q }}</ref> transformations of <math>X</math> and <math>Y</math>, and still represents the amount of discrete information that can be transmitted over a channel that admits a continuous space of values. |
| | | |
− | Consider the Kullback–Leibler divergence between the two distributions
| |
− |
| |
− | 考虑两个分布之间的 Kullback-Leibler 散度
| |
− |
| |
− | Note that the continuous [[mutual information]] <math>I(X;Y)</math> has the distinction of retaining its fundamental significance as a measure of discrete information since it is actually the limit of the discrete mutual information of ''partitions'' of <math>X</math> and <math>Y</math> as these partitions become finer and finer. Thus it is invariant under non-linear [[homeomorphisms]] (continuous and uniquely invertible maps), <ref>{{cite journal
| |
− |
| |
− | <math> 0 \leq D_{KL}(f || g) = \int_{-\infty}^\infty f(x) \log \left( \frac{f(x)}{g(x)} \right) dx = -h(f) - \int_{-\infty}^\infty f(x)\log(g(x)) dx.</math>
| |
− |
| |
− | (f | | g) = int _ {-infty } ^ infty f (x) log left (frac { f (x)}{ g (x)} right) dx =-h (f)-int _ {-infty } ^ infty f (x) log (g (x)) dx
| |
− |
| |
− | | first = Alexander
| |
− |
| |
− | Now note that
| |
− |
| |
− | 现在注意
| |
− |
| |
− | | last = Kraskov
| |
− |
| |
− | <math>\begin{align}
| |
− |
| |
− | 1.1.1.2.2.2.2.2.2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.3.3.3.3.3.3.3.3.3.3.3.4.3.3.3.3.3.3.3.3.3
| |
− |
| |
− | |author2=Stögbauer, Grassberger
| |
− |
| |
− | \int_{-\infty}^\infty f(x)\log(g(x)) dx &= \int_{-\infty}^\infty f(x)\log\left( \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right) dx \\
| |
− |
| |
− | Int _ {-infty } ^ infty f (x) log (g (x)) dx & = int _ {-infty } ^ infty f (x) log left (frac {1}{ sqrt {2 pi sigma ^ 2} e ^ {-frac {(x-mu) ^ 2}{2 sigma ^ 2} right) dx
| |
− |
| |
− | | year = 2004
| |
− |
| |
− | &= \int_{-\infty}^\infty f(x) \log\frac{1}{\sqrt{2\pi\sigma^2}} dx + \log(e)\int_{-\infty}^\infty f(x)\left( -\frac{(x-\mu)^2}{2\sigma^2}\right) dx \\
| |
− |
| |
− | & = int _ {-infty } ^ infty f (x) log frac {1}{ sqrt {2 pi sigma ^ 2} dx + log (e) int _ {-infty } ^ infty f (x) left (- frac {(x-mu) ^ 2}{2 sigma ^ 2} right) dx
| |
− |
| |
− | | title = Estimating mutual information
| |
− |
| |
− | &= -\tfrac{1}{2}\log(2\pi\sigma^2) - \log(e)\frac{\sigma^2}{2\sigma^2} \\
| |
− |
| |
− | & =-tfrac {1}{2} log (2 pi sigma ^ 2)-log (e) frac { sigma ^ 2}{2 sigma ^ 2}
| |
− |
| |
− | | journal = [[Physical Review E]]
| |
− |
| |
− | &= -\tfrac{1}{2}\left(\log(2\pi\sigma^2) + \log(e)\right) \\
| |
− |
| |
− | & =-tfrac {1}{2}左(log (2 pi sigma ^ 2) + log (e) right)
| |
− |
| |
− | | volume = 60
| |
− |
| |
− | &= -\tfrac{1}{2}\log(2\pi e \sigma^2) \\
| |
− |
| |
− | & =-tfrac {1}{2} log (2 pi e sigma ^ 2)
| |
− |
| |
− | | pages = 066138
| |
− |
| |
− | &= -h(g)
| |
− |
| |
− | & =-h (g)
| |
− |
| |
− | | doi =10.1103/PhysRevE.69.066138
| |
− |
| |
− | \end{align}</math>
| |
− |
| |
− | 结束{ align } </math >
| |
− |
| |
− | |arxiv = cond-mat/0305641 |bibcode = 2004PhRvE..69f6138K }}</ref> including linear <ref name = Reza>{{ cite book | title = An Introduction to Information Theory | author = Fazlollah M. Reza | publisher = Dover Publications, Inc., New York | origyear = 1961| year = 1994 | isbn = 0-486-68210-2 | url = https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22&as_brr=0&ei=zP79Ro7UBovqoQK4g_nCCw&sig=j3lPgyYrC3-bvn1Td42TZgTzj0Q }}</ref> transformations of <math>X</math> and <math>Y</math>, and still represents the amount of discrete information that can be transmitted over a channel that admits a continuous space of values.
| |
− |
| |
− | because the result does not depend on <math>f(x)</math> other than through the variance. Combining the two results yields
| |
− |
| |
− | 因为结果并不依赖于f(x),而是通过方差。将这两个结果结合起来就会产生结果
| |
− |
| |
− |
| |
− |
| |
− | <math> h(g) - h(f) \geq 0 \!</math>
| |
− |
| |
− | [数学]-[数学]
| |
| | | |
| For the direct analogue of discrete entropy extended to the continuous space, see [[limiting density of discrete points]]. | | For the direct analogue of discrete entropy extended to the continuous space, see [[limiting density of discrete points]]. |
− |
| |
− | with equality when <math>f(x)=g(x)</math> following from the properties of Kullback–Leibler divergence.
| |
− |
| |
− | 当f (x) = g (x)遵循 Kullback-Leibler 分歧的性质时。
| |
| | | |
| ==Properties of differential entropy== | | ==Properties of differential entropy== |
| 微分熵的性质 | | 微分熵的性质 |
| * For probability densities <math>f</math> and <math>g</math>, the [[Kullback–Leibler divergence]] <math>D_{KL}(f || g)</math> is greater than or equal to 0 with equality only if <math>f=g</math> [[almost everywhere]]. Similarly, for two random variables <math>X</math> and <math>Y</math>, <math>I(X;Y) \ge 0</math> and <math>h(X|Y) \le h(X)</math> with equality [[if and only if]] <math>X</math> and <math>Y</math> are [[Statistical independence|independent]]. | | * For probability densities <math>f</math> and <math>g</math>, the [[Kullback–Leibler divergence]] <math>D_{KL}(f || g)</math> is greater than or equal to 0 with equality only if <math>f=g</math> [[almost everywhere]]. Similarly, for two random variables <math>X</math> and <math>Y</math>, <math>I(X;Y) \ge 0</math> and <math>h(X|Y) \le h(X)</math> with equality [[if and only if]] <math>X</math> and <math>Y</math> are [[Statistical independence|independent]]. |
− |
| |
− | This result may also be demonstrated using the variational calculus. A Lagrangian function with two Lagrangian multipliers may be defined as:
| |
− |
| |
− | 这个结果也可以用变分法来证明。具有两个拉格朗日乘数的拉格朗日函数可定义为:
| |
| | | |
| * The chain rule for differential entropy holds as in the discrete case<ref name="cover_thomas" />{{rp|253}} | | * The chain rule for differential entropy holds as in the discrete case<ref name="cover_thomas" />{{rp|253}} |
第206行: |
第61行: |
| | | |
| <math>L=\int_{-\infty}^\infty g(x)\ln(g(x))\,dx-\lambda_0\left(1-\int_{-\infty}^\infty g(x)\,dx\right)-\lambda\left(\sigma^2-\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math> | | <math>L=\int_{-\infty}^\infty g(x)\ln(g(x))\,dx-\lambda_0\left(1-\int_{-\infty}^\infty g(x)\,dx\right)-\lambda\left(\sigma^2-\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math> |
− |
| |
− | < math > l = int _ {-infty } ^ infty g (x) ln (g (x)) ,dx-lambda _ 0 left (1-int _ {-infty } ^ infty g (x) ,dx 右)-lambda left (sigma ^ 2-int _ {-infty } ^ infty g (x)(x-mu) ^ 2,dx 右) </math >
| |
| | | |
| * Differential entropy is translation invariant, i.e. for a constant <math>c</math>.<ref name="cover_thomas" />{{rp|253}} | | * Differential entropy is translation invariant, i.e. for a constant <math>c</math>.<ref name="cover_thomas" />{{rp|253}} |
| | | |
| ::<math>h(X+c) = h(X)</math> | | ::<math>h(X+c) = h(X)</math> |
− |
| |
− | where g(x) is some function with mean μ. When the entropy of g(x) is at a maximum and the constraint equations, which consist of the normalization condition <math>\left(1=\int_{-\infty}^\infty g(x)\,dx\right)</math> and the requirement of fixed variance <math>\left(\sigma^2=\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math>, are both satisfied, then a small variation δg(x) about g(x) will produce a variation δL about L which is equal to zero:
| |
− |
| |
− | 其中 g (x)是平均 μ 的函数。当 g (x)的熵处于最大值时,由归一化条件 1=∫∞−∞g(x)dx和固定方差σ2=∫∞−∞g(x)(x−μ)2dx组成的约束方程都满足时,那么关于 g (x)的一个小变化 δg (x)将产生一个等于零的关于L的变化δL:
| |
| | | |
| * Differential entropy is in general not invariant under arbitrary invertible maps. | | * Differential entropy is in general not invariant under arbitrary invertible maps. |
第223行: |
第72行: |
| <math>0=\delta L=\int_{-\infty}^\infty \delta g(x)\left (\ln(g(x))+1+\lambda_0+\lambda(x-\mu)^2\right )\,dx</math> | | <math>0=\delta L=\int_{-\infty}^\infty \delta g(x)\left (\ln(g(x))+1+\lambda_0+\lambda(x-\mu)^2\right )\,dx</math> |
| | | |
− | 0 = delta l = int _ {-infty } ^ infty delta g (x) left (ln (g (x)) + 1 + lambda _ 0 + lambda (x-mu) ^ 2 right) ,dx </math >
| |
| | | |
| :::<math>h(aX) = h(X)+ \log |a|</math> | | :::<math>h(aX) = h(X)+ \log |a|</math> |
| | | |
| :: For a vector valued random variable <math>\mathbf{X}</math> and an invertible (square) [[matrix (mathematics)|matrix]] <math>\mathbf{A}</math> | | :: For a vector valued random variable <math>\mathbf{X}</math> and an invertible (square) [[matrix (mathematics)|matrix]] <math>\mathbf{A}</math> |
− |
| |
− | Since this must hold for any small δg(x), the term in brackets must be zero, and solving for g(x) yields:
| |
− |
| |
− | 因为这对任何小的 δg (x)都成立,括号中的项必须为零,求 g (x)的结果是:
| |
| | | |
| :::<math>h(\mathbf{A}\mathbf{X})=h(\mathbf{X})+\log \left( |\det \mathbf{A}| \right)</math><ref name="cover_thomas" />{{rp|253}} | | :::<math>h(\mathbf{A}\mathbf{X})=h(\mathbf{X})+\log \left( |\det \mathbf{A}| \right)</math><ref name="cover_thomas" />{{rp|253}} |
第238行: |
第82行: |
| | | |
| <math>g(x)=e^{-\lambda_0-1-\lambda(x-\mu)^2}</math> | | <math>g(x)=e^{-\lambda_0-1-\lambda(x-\mu)^2}</math> |
− |
| |
− | < math > g (x) = e ^ {-lambda _ 0-1-lambda (x-mu) ^ 2} </math >
| |
| | | |
| ::<math>h(\mathbf{Y}) \leq h(\mathbf{X}) + \int f(x) \log \left\vert \frac{\partial m}{\partial x} \right\vert dx</math> | | ::<math>h(\mathbf{Y}) \leq h(\mathbf{X}) + \int f(x) \log \left\vert \frac{\partial m}{\partial x} \right\vert dx</math> |
| | | |
| :where <math>\left\vert \frac{\partial m}{\partial x} \right\vert</math> is the [[Jacobian matrix and determinant|Jacobian]] of the transformation <math>m</math>.<ref>{{cite web |title=proof of upper bound on differential entropy of f(X) |work=[[Stack Exchange]] |date=April 16, 2016 |url=https://math.stackexchange.com/q/1745670 }}</ref> The above inequality becomes an equality if the transform is a bijection. Furthermore, when <math>m</math> is a rigid rotation, translation, or combination thereof, the Jacobian determinant is always 1, and <math>h(Y)=h(X)</math>. | | :where <math>\left\vert \frac{\partial m}{\partial x} \right\vert</math> is the [[Jacobian matrix and determinant|Jacobian]] of the transformation <math>m</math>.<ref>{{cite web |title=proof of upper bound on differential entropy of f(X) |work=[[Stack Exchange]] |date=April 16, 2016 |url=https://math.stackexchange.com/q/1745670 }}</ref> The above inequality becomes an equality if the transform is a bijection. Furthermore, when <math>m</math> is a rigid rotation, translation, or combination thereof, the Jacobian determinant is always 1, and <math>h(Y)=h(X)</math>. |
− |
| |
− | Using the constraint equations to solve for λ<sub>0</sub> and λ yields the normal distribution:
| |
− |
| |
− | 用约束方程求解 λ0和 λ 得到正态分布:
| |
| | | |
| * If a random vector <math>X \in \mathbb{R}^n</math> has mean zero and [[covariance]] matrix <math>K</math>, <math>h(\mathbf{X}) \leq \frac{1}{2} \log(\det{2 \pi e K}) = \frac{1}{2} \log[(2\pi e)^n \det{K}]</math> with equality if and only if <math>X</math> is [[Multivariate normal distribution#Joint normality|jointly gaussian]] (see [[#Maximization in the normal distribution|below]]).<ref name="cover_thomas" />{{rp|254}} | | * If a random vector <math>X \in \mathbb{R}^n</math> has mean zero and [[covariance]] matrix <math>K</math>, <math>h(\mathbf{X}) \leq \frac{1}{2} \log(\det{2 \pi e K}) = \frac{1}{2} \log[(2\pi e)^n \det{K}]</math> with equality if and only if <math>X</math> is [[Multivariate normal distribution#Joint normality|jointly gaussian]] (see [[#Maximization in the normal distribution|below]]).<ref name="cover_thomas" />{{rp|254}} |
| | | |
− |
| |
− |
| |
− | <math>g(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}</math>
| |
− |
| |
− | < math > g (x) = frac {1}{ sqrt {2 pi sigma ^ 2} e ^ {-frac {(x-mu) ^ 2}{2 sigma ^ 2}} </math >
| |
− |
| |
− | However, differential entropy does not have other desirable properties:
| |
− | 然而,微分熵并没有期望的性质
| |
| * It is not invariant under [[change of variables]], and is therefore most useful with dimensionless variables. | | * It is not invariant under [[change of variables]], and is therefore most useful with dimensionless variables. |
| 它在变量变化下不是不变的,因此对无量纲变量最有用 | | 它在变量变化下不是不变的,因此对无量纲变量最有用 |
第268行: |
第98行: |
| | | |
| A modification of differential entropy that addresses these drawbacks is the '''relative information entropy''', also known as the Kullback–Leibler divergence, which includes an [[invariant measure]] factor (see [[limiting density of discrete points]]). | | A modification of differential entropy that addresses these drawbacks is the '''relative information entropy''', also known as the Kullback–Leibler divergence, which includes an [[invariant measure]] factor (see [[limiting density of discrete points]]). |
− |
| |
− |
| |
− |
| |
− | <math>f(x) = \lambda e^{-\lambda x} \mbox{ for } x \geq 0.</math>
| |
− |
| |
− | { for } x geq 0. </math >
| |
| | | |
| ==Maximization in the normal distribution== | | ==Maximization in the normal distribution== |
第281行: |
第105行: |
| Its differential entropy is then | | Its differential entropy is then |
| 它的微分熵就会 | | 它的微分熵就会 |
− | With a [[normal distribution]], differential entropy is maximized for a given variance. A Gaussian random variable has the largest entropy amongst all random variables of equal variance, or, alternatively, the maximum entropy distribution under constraints of mean and variance is the Gaussian.<ref name="cover_thomas" />{{rp|255}} | + | With a [[normal distribution]], differential entropy is maximized for a given variance. A Gaussian random variable has the largest entropy amongst all random variables of equal variance, or, alternatively, the maximum entropy distribution under constraints of mean and variance is the Gaussian.<ref name="cover_thomas" /> |
| 对于正态分布,对于给定的方差,微分熵是最大的。在所有等方差随机变量中,高斯随机变量的熵最大,或者在均值和方差约束下的最大熵分布是高斯分布 | | 对于正态分布,对于给定的方差,微分熵是最大的。在所有等方差随机变量中,高斯随机变量的熵最大,或者在均值和方差约束下的最大熵分布是高斯分布 |
− | {|
| |
− |
| |
− | {|
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
| | | |
− | |-
| |
| | | |
| ===Proof=== | | ===Proof=== |
| 证明 | | 证明 |
− | | <math>h_e(X)\,</math>
| |
− |
| |
− | | < math > h _ e (x) ,</math >
| |
| | | |
| Let <math>g(x)</math> be a [[Normal distribution|Gaussian]] [[Probability density function|PDF]] with mean μ and variance <math>\sigma^2</math> and <math>f(x)</math> an arbitrary [[Probability density function|PDF]] with the same variance. Since differential entropy is translation invariant we can assume that <math>f(x)</math> has the same mean of <math>\mu</math> as <math>g(x)</math>. | | Let <math>g(x)</math> be a [[Normal distribution|Gaussian]] [[Probability density function|PDF]] with mean μ and variance <math>\sigma^2</math> and <math>f(x)</math> an arbitrary [[Probability density function|PDF]] with the same variance. Since differential entropy is translation invariant we can assume that <math>f(x)</math> has the same mean of <math>\mu</math> as <math>g(x)</math>. |
− |
| |
− | | <math>=-\int_0^\infty \lambda e^{-\lambda x} \log (\lambda e^{-\lambda x})\,dx</math>
| |
− |
| |
− | | < math > =-int _ 0 ^ infty lambda e ^ {-lambda x } log (lambda e ^ {-lambda x }) ,dx </math >
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
| | | |
| Consider the [[Kullback–Leibler divergence]] between the two distributions | | Consider the [[Kullback–Leibler divergence]] between the two distributions |
− |
| |
− | |
| |
− |
| |
− | |
| |
− |
| |
| :<math> 0 \leq D_{KL}(f || g) = \int_{-\infty}^\infty f(x) \log \left( \frac{f(x)}{g(x)} \right) dx = -h(f) - \int_{-\infty}^\infty f(x)\log(g(x)) dx.</math> | | :<math> 0 \leq D_{KL}(f || g) = \int_{-\infty}^\infty f(x) \log \left( \frac{f(x)}{g(x)} \right) dx = -h(f) - \int_{-\infty}^\infty f(x)\log(g(x)) dx.</math> |
− |
| |
− | | <math>= -\left(\int_0^\infty (\log \lambda)\lambda e^{-\lambda x}\,dx + \int_0^\infty (-\lambda x) \lambda e^{-\lambda x}\,dx\right) </math>
| |
− |
| |
− | | < math > =-left (int _ 0 ^ infty (log lambda) lambda e ^ {-lambda x } ,dx + int _ 0 ^ infty (- lambda x) lambda e ^ {-lambda x } ,dx right) </math >
| |
− |
| |
| Now note that | | Now note that |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
| :<math>\begin{align} | | :<math>\begin{align} |
− |
| |
− | |
| |
− |
| |
− | |
| |
− |
| |
| \int_{-\infty}^\infty f(x)\log(g(x)) dx &= \int_{-\infty}^\infty f(x)\log\left( \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right) dx \\ | | \int_{-\infty}^\infty f(x)\log(g(x)) dx &= \int_{-\infty}^\infty f(x)\log\left( \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right) dx \\ |
− |
| |
− | | <math>= -\log \lambda \int_0^\infty f(x)\,dx + \lambda E[X]</math>
| |
− |
| |
− | | < math > =-log lambda int _ 0 ^ infty f (x) ,dx + lambda e [ x ] </math >
| |
− |
| |
| &= \int_{-\infty}^\infty f(x) \log\frac{1}{\sqrt{2\pi\sigma^2}} dx + \log(e)\int_{-\infty}^\infty f(x)\left( -\frac{(x-\mu)^2}{2\sigma^2}\right) dx \\ | | &= \int_{-\infty}^\infty f(x) \log\frac{1}{\sqrt{2\pi\sigma^2}} dx + \log(e)\int_{-\infty}^\infty f(x)\left( -\frac{(x-\mu)^2}{2\sigma^2}\right) dx \\ |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
| &= -\tfrac{1}{2}\log(2\pi\sigma^2) - \log(e)\frac{\sigma^2}{2\sigma^2} \\ | | &= -\tfrac{1}{2}\log(2\pi\sigma^2) - \log(e)\frac{\sigma^2}{2\sigma^2} \\ |
− |
| |
− | |
| |
− |
| |
− | |
| |
− |
| |
| &= -\tfrac{1}{2}\left(\log(2\pi\sigma^2) + \log(e)\right) \\ | | &= -\tfrac{1}{2}\left(\log(2\pi\sigma^2) + \log(e)\right) \\ |
− |
| |
− | | <math>= -\log\lambda + 1\,.</math>
| |
− |
| |
− | | < math > =-log lambda + 1,. </math >
| |
− |
| |
| &= -\tfrac{1}{2}\log(2\pi e \sigma^2) \\ | | &= -\tfrac{1}{2}\log(2\pi e \sigma^2) \\ |
− |
| |
− | |}
| |
− |
| |
− | |}
| |
− |
| |
| &= -h(g) | | &= -h(g) |
− |
| |
| \end{align}</math> | | \end{align}</math> |
| | | |
− | Here, <math>h_e(X)</math> was used rather than <math>h(X)</math> to make it explicit that the logarithm was taken to base e, to simplify the calculation.
| |
− |
| |
− | 在这里,使用he(X)而不是h(X) 来明确对数是以 e 为底,以简化计算。
| |
| | | |
| because the result does not depend on <math>f(x)</math> other than through the variance. Combining the two results yields | | because the result does not depend on <math>f(x)</math> other than through the variance. Combining the two results yields |
第379行: |
第133行: |
| with equality when <math>f(x)=g(x)</math> following from the properties of Kullback–Leibler divergence. | | with equality when <math>f(x)=g(x)</math> following from the properties of Kullback–Leibler divergence. |
| | | |
− | The differential entropy yields a lower bound on the expected squared error of an estimator. For any random variable <math>X</math> and estimator <math>\widehat{X}</math> the following holds:
| |
| | | |
− | 对于估计量的预期平方误差,微分熵产生一个下限。对于任何随机变量x和估计量Xˆ 下面的值:
| |
− |
| |
− |
| |
− |
| |
− | <math>\operatorname{E}[(X - \widehat{X})^2] \ge \frac{1}{2\pi e}e^{2h(X)}</math>
| |
− |
| |
− | (x-widehat { x }) ^ 2] ge frac {1}{2 pi e } e ^ {2 h (x)} </math >
| |
| | | |
| ===Alternative proof=== | | ===Alternative proof=== |
| 替代证明 | | 替代证明 |
− | with equality if and only if <math>X</math> is a Gaussian random variable and <math>\widehat{X}</math> is the mean of <math>X</math>.
| |
− |
| |
− | 当且仅当x是一个 Gaussian 随机变量,而x 是Xˆ 的平均值。
| |
| | | |
| This result may also be demonstrated using the [[variational calculus]]. A Lagrangian function with two [[Lagrangian multiplier]]s may be defined as: | | This result may also be demonstrated using the [[variational calculus]]. A Lagrangian function with two [[Lagrangian multiplier]]s may be defined as: |
− |
| |
− |
| |
| | | |
| :<math>L=\int_{-\infty}^\infty g(x)\ln(g(x))\,dx-\lambda_0\left(1-\int_{-\infty}^\infty g(x)\,dx\right)-\lambda\left(\sigma^2-\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math> | | :<math>L=\int_{-\infty}^\infty g(x)\ln(g(x))\,dx-\lambda_0\left(1-\int_{-\infty}^\infty g(x)\,dx\right)-\lambda\left(\sigma^2-\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math> |
− |
| |
− | In the table below <math>\Gamma(x) = \int_0^{\infty} e^{-t} t^{x-1} dt</math> is the gamma function, <math>\psi(x) = \frac{d}{dx} \ln\Gamma(x)=\frac{\Gamma'(x)}{\Gamma(x)}</math> is the digamma function, <math>B(p,q) = \frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}</math> is the beta function, and γ<sub>E</sub> is Euler's constant.<math>- (\beta-1)[\psi(\beta) - \psi(\alpha + \beta)] \, </math>||<math>[0,1]\,</math>
| |
− |
| |
− | 在下面的表格中,Gamma (x) = int _ 0 ^ { infty } e ^ {-t } t ^ { x-1} dt </math > 是 Gamma 函数,{ math > psi (x) = frac { d }{ dx } ln Gamma (x) = frac { Gamma’(x)}{ Gamma (x)} </math > 是双伽玛函数,b (p,q) = frac { Gamma (p) Gamma (q)}{ Gamma (p + q)} </math > 是 β 函数,γ < sub > e </sub > 是欧拉常数。[ math ]-(beta-1)[ psi (beta)-psi (alpha + beta)] | | < math > [0,1] ,</math >
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
| | | |
| where ''g(x)'' is some function with mean μ. When the entropy of ''g(x)'' is at a maximum and the constraint equations, which consist of the normalization condition <math>\left(1=\int_{-\infty}^\infty g(x)\,dx\right)</math> and the requirement of fixed variance <math>\left(\sigma^2=\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math>, are both satisfied, then a small variation δ''g''(''x'') about ''g(x)'' will produce a variation δ''L'' about ''L'' which is equal to zero: | | where ''g(x)'' is some function with mean μ. When the entropy of ''g(x)'' is at a maximum and the constraint equations, which consist of the normalization condition <math>\left(1=\int_{-\infty}^\infty g(x)\,dx\right)</math> and the requirement of fixed variance <math>\left(\sigma^2=\int_{-\infty}^\infty g(x)(x-\mu)^2\,dx\right)</math>, are both satisfied, then a small variation δ''g''(''x'') about ''g(x)'' will produce a variation δ''L'' about ''L'' which is equal to zero: |
− |
| |
− | | Cauchy || <math>f(x) = \frac{\gamma}{\pi} \frac{1}{\gamma^2 + x^2}</math> || <math>\ln(4\pi\gamma) \, </math>||<math>(-\infty,\infty)\,</math>
| |
− |
| |
− | | Cauchy | | < math > f (x) = frac { gamma }{ pi }{ pi ^ 2 + x ^ 2} </math > | < math > ln (4pi gamma) ,</math > | < math > (- infty,infty) ,</math >
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
| | | |
| :<math>0=\delta L=\int_{-\infty}^\infty \delta g(x)\left (\ln(g(x))+1+\lambda_0+\lambda(x-\mu)^2\right )\,dx</math> | | :<math>0=\delta L=\int_{-\infty}^\infty \delta g(x)\left (\ln(g(x))+1+\lambda_0+\lambda(x-\mu)^2\right )\,dx</math> |
− |
| |
− | | Chi || <math>f(x) = \frac{2}{2^{k/2} \Gamma(k/2)} x^{k-1} \exp\left(-\frac{x^2}{2}\right)</math> || <math>\ln{\frac{\Gamma(k/2)}{\sqrt{2}}} - \frac{k-1}{2} \psi\left(\frac{k}{2}\right) + \frac{k}{2}</math>||<math>[0,\infty)\,</math>
| |
− |
| |
− | | Chi | | < math > f (x) = frac {2}{2 ^ { k/2} Gamma (k/2)}} x ^ { k-1} exp left (- frac { x ^ 2}{2}右) </math > | < math > ln { frac {(k/2)}}}{2}}}-frac {2} psi (frac { k }{2}右) + frac {2} </math > | | math > [0,infty) ,</math >
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
| | | |
| Since this must hold for any small δ''g''(''x''), the term in brackets must be zero, and solving for ''g(x)'' yields: | | Since this must hold for any small δ''g''(''x''), the term in brackets must be zero, and solving for ''g(x)'' yields: |
− |
| |
− | | Chi-squared || <math>f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{\frac{k}{2}\!-\!1} \exp\left(-\frac{x}{2}\right)</math> || <math>\ln 2\Gamma\left(\frac{k}{2}\right) - \left(1 - \frac{k}{2}\right)\psi\left(\frac{k}{2}\right) + \frac{k}{2}</math>||<math>[0,\infty)\,</math>
| |
− |
| |
− | | Chi-squared | < math > f (x) = frac {1}{2 ^ { k/2} Gamma (k/2)} x ^ { frac { k }{2} !-! 1} exp left (- frac { x }{2}右) </math > | < math > | < math > ln 2 Gamma left (frac { k }{2}右)-left (1-frac { k }{2}右)左(frac { k }2}右) + c { k {2}{ infmath | < < math > [0,fraty) ,</math >
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
| | | |
| :<math>g(x)=e^{-\lambda_0-1-\lambda(x-\mu)^2}</math> | | :<math>g(x)=e^{-\lambda_0-1-\lambda(x-\mu)^2}</math> |
− |
| |
− | | Erlang || <math>f(x) = \frac{\lambda^k}{(k-1)!} x^{k-1} \exp(-\lambda x)</math> || <math>(1-k)\psi(k) + \ln \frac{\Gamma(k)}{\lambda} + k</math>||<math>[0,\infty)\,</math>
| |
− |
| |
− | | Erlang | | < math > f (x) = frac { lambda ^ k }{(k-1) ! }X ^ { k-1} exp (- lambda x) </math > | < math > (1-k) psi (k) + ln frac { Gamma (k)}{ lambda } + k </math > | < math > [0,infty ] ,</math >
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
| | | |
| Using the constraint equations to solve for λ<sub>0</sub> and λ yields the normal distribution: | | Using the constraint equations to solve for λ<sub>0</sub> and λ yields the normal distribution: |
− |
| |
− | | F || <math>f(x) = \frac{n_1^{\frac{n_1}{2}} n_2^{\frac{n_2}{2}}}{B(\frac{n_1}{2},\frac{n_2}{2})} \frac{x^{\frac{n_1}{2} - 1}}{(n_2 + n_1 x)^{\frac{n_1 + n2}{2}}}</math> || <math>\ln \frac{n_1}{n_2} B\left(\frac{n_1}{2},\frac{n_2}{2}\right) + \left(1 - \frac{n_1}{2}\right) \psi\left(\frac{n_1}{2}\right) -</math><br /><math>\left(1 + \frac{n_2}{2}\right)\psi\left(\frac{n_2}{2}\right) + \frac{n_1 + n_2}{2} \psi\left(\frac{n_1\!+\!n_2}{2}\right)</math>||<math>[0,\infty)\,</math>
| |
− |
| |
− | 我们会找到你的| | < math > f (x) = frac{ n _ 1 ^ { frac { n _ 1}{2}{ frac { n _ 2}{2}}{ b (frac { n _ 1}{2} ,frac { n _ 2}{2}}}}}} frac { x ^ { frac { n _ 1}{2}-1}{(n _ 2 + n _ 1 x) ^ { frac { n _ 1 + n _ 2}{2}}{2}{2}} </} </math > | | | (frac { n _ 1}{ n _ 2} b left (frac { n _ 1}{2} ,2}{2}{2}{2}{2}{2}{1}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{1}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2}{2} psi 左(frac { n _ 1!+\![0,infty) ,</math > | < math >
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
| | | |
| :<math>g(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}</math> | | :<math>g(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}</math> |
| | | |
− | | Gamma || <math>f(x) = \frac{x^{k - 1} \exp(-\frac{x}{\theta})}{\theta^k \Gamma(k)}</math> || <math>\ln(\theta \Gamma(k)) + (1 - k)\psi(k) + k \, </math>||<math>[0,\infty)\,</math>
| |
− |
| |
− | | Gamma | | < math > f (x) = frac { x ^ { k-1} exp (- frac { x }{ theta })}{ theta ^ k Gamma (k)} </math > | < math > ln (theta Gamma (k)) + (1-k) psi (k) + k,</math > | < math > [0,infty) ,</math >
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
| | | |
| ==Example: Exponential distribution== | | ==Example: Exponential distribution== |
| 例子:指数分布 | | 例子:指数分布 |
− | | Laplace || <math>f(x) = \frac{1}{2b} \exp\left(-\frac{|x - \mu|}{b}\right)</math> || <math>1 + \ln(2b) \, </math>||<math>(-\infty,\infty)\,</math>
| |
− |
| |
− | | Laplace | | < math > f (x) = frac {1}{2b } exp left (- frac { | x-mu | }{ b } right) </math > | < math > 1 + ln (2b) ,</math > | < math > (- infty,infty) ,</math >
| |
− |
| |
| Let <math>X</math> be an [[exponential distribution|exponentially distributed]] random variable with parameter <math>\lambda</math>, that is, with probability density function | | Let <math>X</math> be an [[exponential distribution|exponentially distributed]] random variable with parameter <math>\lambda</math>, that is, with probability density function |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
− |
| |
− |
| |
− | | Logistic || <math>f(x) = \frac{e^{-x}}{(1 + e^{-x})^2}</math> || <math>2 \, </math>||<math>(-\infty,\infty)\,</math>
| |
− |
| |
− | | Logistic | | < math > f (x) = frac { e ^ {-x }{(1 + e ^ {-x }) ^ 2} </math > | < math > 2,</math > | < math > (- infty,infty) ,</math >
| |
| | | |
| :<math>f(x) = \lambda e^{-\lambda x} \mbox{ for } x \geq 0.</math> | | :<math>f(x) = \lambda e^{-\lambda x} \mbox{ for } x \geq 0.</math> |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
− |
| |
− |
| |
− | | Lognormal || <math>f(x) = \frac{1}{\sigma x \sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right)</math> || <math>\mu + \frac{1}{2} \ln(2\pi e \sigma^2)</math>||<math>[0,\infty)\,</math>
| |
− |
| |
− | | Lognormal | < math > f (x) = frac {1}{ sigma x sqrt {2 pi } exp left (- frac {(ln x-mu) ^ 2}{2 sigma ^ 2} right) </math > | < math > mu + frac {1}{2} ln (2 pi e sigma ^ 2) </math > | < math > [0,infty) ,</math >
| |
| | | |
| Its differential entropy is then | | Its differential entropy is then |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
| {| | | {| |
− |
| |
− | | Maxwell–Boltzmann || <math>f(x) = \frac{1}{a^3}\sqrt{\frac{2}{\pi}}\,x^{2}\exp\left(-\frac{x^2}{2a^2}\right)</math> || <math>\ln(a\sqrt{2\pi})+\gamma_E-\frac{1}{2}</math>||<math>[0,\infty)\,</math>
| |
− |
| |
− | | Maxwell-Boltzmann | | < math > f (x) = frac {1}{ a ^ 3}{ frac {2}{ pi } ,x ^ {2} exp left (- frac { x ^ 2}{2a ^ 2}右) </math > | < math > ln (a sqrt {2 pi }) + e-frac {1} </math > | | math < 0,infty) ,</math >
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
| |- | | |- |
− |
| |
| | <math>h_e(X)\,</math> | | | <math>h_e(X)\,</math> |
− |
| |
− | | Generalized normal || <math>f(x) = \frac{2 \beta^{\frac{\alpha}{2}}}{\Gamma(\frac{\alpha}{2})} x^{\alpha - 1} \exp(-\beta x^2)</math> || <math>\ln{\frac{\Gamma(\alpha/2)}{2\beta^{\frac{1}{2}}}} - \frac{\alpha - 1}{2} \psi\left(\frac{\alpha}{2}\right) + \frac{\alpha}{2}</math>||<math>(-\infty,\infty)\,</math>
| |
− |
| |
− | | 广义正态| | < math > f (x) = frac{2 beta ^ { frac { alpha }{2}{ Gamma (frac { alpha }{2})} x ^ { alpha-1} exp (- beta x ^ 2) </math > | | < math > ln { frac { Gamma (alpha/2)}{2 beta ^ { frac {1}{2}}}}}-frac { alpha-1}{2} psi left (frac { alpha }{2} right) + frac { alpha }{2}}{2}| | < math > (- infty,infty) ,</math >
| |
− |
| |
| | <math>=-\int_0^\infty \lambda e^{-\lambda x} \log (\lambda e^{-\lambda x})\,dx</math> | | | <math>=-\int_0^\infty \lambda e^{-\lambda x} \log (\lambda e^{-\lambda x})\,dx</math> |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
| |- | | |- |
− |
| |
− | | Pareto || <math>f(x) = \frac{\alpha x_m^\alpha}{x^{\alpha+1}}</math> || <math>\ln \frac{x_m}{\alpha} + 1 + \frac{1}{\alpha}</math>||<math>[x_m,\infty)\,</math>
| |
− |
| |
− | | Pareto | < math > f (x) = frac { alpha x _ m ^ alpha }{ x ^ { alpha + 1}} </math > | < math > ln frac { x _ m }{ alpha } + 1 + frac {1}{ alpha } </math > | < math > [ x _ m,infty ] ,</math >
| |
− |
| |
| | | | | |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
| | <math>= -\left(\int_0^\infty (\log \lambda)\lambda e^{-\lambda x}\,dx + \int_0^\infty (-\lambda x) \lambda e^{-\lambda x}\,dx\right) </math> | | | <math>= -\left(\int_0^\infty (\log \lambda)\lambda e^{-\lambda x}\,dx + \int_0^\infty (-\lambda x) \lambda e^{-\lambda x}\,dx\right) </math> |
− |
| |
− | | Student's t || <math>f(x) = \frac{(1 + x^2/\nu)^{-\frac{\nu+1}{2}}}{\sqrt{\nu}B(\frac{1}{2},\frac{\nu}{2})}</math> || <math>\frac{\nu\!+\!1}{2}\left(\psi\left(\frac{\nu\!+\!1}{2}\right)\!-\!\psi\left(\frac{\nu}{2}\right)\right)\!+\!\ln \sqrt{\nu} B\left(\frac{1}{2},\frac{\nu}{2}\right)</math>||<math>(-\infty,\infty)\,</math>
| |
− |
| |
− | | Student’s t | < math > f (x) = frac {(1 + x ^ 2/nu) ^ {-frac { nu + 1}{2}}{{ sqrt { nu } b (frac {1}{2} ,frac { nu }{2})} </math | | | < math > frac { nu! + ! 1}{2}右) !-! 左(psi (frac { nu! + 1}{2}右) !-! 左(frac { nu }{2右) ! + ! { nu }{ n 左(frac {2}右)
| |
− |
| |
| |- | | |- |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
| | | | | |
− |
| |
− | | Triangular || <math> f(x) = \begin{cases}
| |
− |
| |
− | | 三角形 | | < math > f (x) = begin { cases }
| |
− |
| |
| | <math>= -\log \lambda \int_0^\infty f(x)\,dx + \lambda E[X]</math> | | | <math>= -\log \lambda \int_0^\infty f(x)\,dx + \lambda E[X]</math> |
− |
| |
− | \frac{2(x-a)}{(b-a)(c-a)} & \mathrm{for\ } a \le x \leq c, \\[4pt]
| |
− |
| |
− | Frac {2(x-a)}{(b-a)(c-a)} & mathrm { for } a le x leq c,[4 pt ]
| |
− |
| |
| |- | | |- |
− |
| |
− | \frac{2(b-x)}{(b-a)(b-c)} & \mathrm{for\ } c < x \le b, \\[4pt]
| |
− |
| |
− | Frac {2(b-x)}{(b-a)(b-c)} & mathrm { for } c < x le b,[4 pt ]
| |
− |
| |
| | | | | |
− |
| |
− | \end{cases}</math> || <math>\frac{1}{2} + \ln \frac{b-a}{2}</math>||<math>[0,1]\,</math>
| |
− |
| |
− | 结束{ cases } </math > | | < math > frac {1}{2} + ln frac { b-a }{2} </math > | < math > [0,1] ,</math >
| |
− |
| |
| | <math>= -\log\lambda + 1\,.</math> | | | <math>= -\log\lambda + 1\,.</math> |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
| |} | | |} |
− |
| |
− | | Weibull || <math>f(x) = \frac{k}{\lambda^k} x^{k-1} \exp\left(-\frac{x^k}{\lambda^k}\right)</math> || <math>\frac{(k-1)\gamma_E}{k} + \ln \frac{\lambda}{k} + 1</math>||<math>[0,\infty)\,</math>
| |
− |
| |
− | | Weibull | | < math > f (x) = frac { k }{ lambda ^ k } x ^ { k-1} exp left (- frac { x ^ k }{ lambda ^ k } right) </math > | < math > | < math > frac {(k-1) gamma _ e }{ k } + ln frac { lambda }{ k } + 1 </math > | < math > [0,infty) ,</math >
| |
− |
| |
− |
| |
− |
| |
− | |-
| |
− |
| |
− | |-
| |
| | | |
| Here, <math>h_e(X)</math> was used rather than <math>h(X)</math> to make it explicit that the logarithm was taken to base ''e'', to simplify the calculation. | | Here, <math>h_e(X)</math> was used rather than <math>h(X)</math> to make it explicit that the logarithm was taken to base ''e'', to simplify the calculation. |
− |
| |
− | | Multivariate normal || <math>
| |
− |
| |
− | 多元正态 | | < 数学 >
| |
− |
| |
− |
| |
− |
| |
− | f_X(\vec{x}) =</math><br /><math> \frac{\exp \left( -\frac{1}{2} ( \vec{x} - \vec{\mu})^\top \Sigma^{-1}\cdot(\vec{x} - \vec{\mu}) \right)} {(2\pi)^{N/2} \left|\Sigma\right|^{1/2}}</math> || <math>\frac{1}{2}\ln\{(2\pi e)^{N} \det(\Sigma)\}</math>||<math>\mathbb{R}^N</math>
| |
− |
| |
− | F _ x (vec { x }) = </math > < br/> < math > frac { exp left (- frac {1}{2}(vec { x }-vec { mu }) ^ top Sigma ^ {-1} cdot (vec { x }-vec { mu }) right)}{(2 pi) ^ { N/2}左 Sigma | right | ^ {1/2} < | < math > | < < | < math > frac {1}{ ln (2 pi e){{ n } | math < | | | | > 数学 < bb >
| |
| | | |
| ==Relation to estimator error== | | ==Relation to estimator error== |
− | 与估计量误差的联系
| |
− | |-
| |
− |
| |
− | |-
| |
− |
| |
| The differential entropy yields a lower bound on the expected squared error of an [[estimator]]. For any random variable <math>X</math> and estimator <math>\widehat{X}</math> the following holds:<ref name="cover_thomas" /> | | The differential entropy yields a lower bound on the expected squared error of an [[estimator]]. For any random variable <math>X</math> and estimator <math>\widehat{X}</math> the following holds:<ref name="cover_thomas" /> |
− |
| |
− | |}
| |
− |
| |
− | |}
| |
− |
| |
| :<math>\operatorname{E}[(X - \widehat{X})^2] \ge \frac{1}{2\pi e}e^{2h(X)}</math> | | :<math>\operatorname{E}[(X - \widehat{X})^2] \ge \frac{1}{2\pi e}e^{2h(X)}</math> |
− |
| |
| with equality if and only if <math>X</math> is a Gaussian random variable and <math>\widehat{X}</math> is the mean of <math>X</math>. | | with equality if and only if <math>X</math> is a Gaussian random variable and <math>\widehat{X}</math> is the mean of <math>X</math>. |
− |
| |
− | Many of the differential entropies are from.
| |
− |
| |
− | 许多熵的差异来自于。
| |
− |
| |
− |
| |
| | | |
| ==Differential entropies for various distributions== | | ==Differential entropies for various distributions== |
| | | |
− | <math>H_h=-\sum_i hf(ih)\log (f(ih)) - \sum hf(ih)\log(h).</math>
| + | In the table below <math>\Gamma(x) = \int_0^{\infty} e^{-t} t^{x-1} dt</math> is the [[gamma function]], <math>\psi(x) = \frac{d}{dx} \ln\Gamma(x)=\frac{\Gamma'(x)}{\Gamma(x)}</math> is the [[digamma function]], <math>B(p,q) = \frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}</math> is the [[beta function]], and γ<sub>''E''</sub> is [[Euler-Mascheroni constant|Euler's constant]].<ref>{{cite journal |last1=Park |first1=Sung Y. |last2=Bera |first2=Anil K. |year=2009 |title=Maximum entropy autoregressive conditional heteroskedasticity model |journal=Journal of Econometrics |publisher=Elsevier |url=http://www.wise.xmu.edu.cn/Master/Download/..%5C..%5CUploadFiles%5Cpaper-masterdownload%5C2009519932327055475115776.pdf |access-date=2011-06-02 |archive-url=https://web.archive.org/web/20160307144515/http://wise.xmu.edu.cn/uploadfiles/paper-masterdownload/2009519932327055475115776.pdf |archive-date=2016-03-07 |url-status=dead }}</ref>{{rp|219–230}} |
− | | |
− | [数学] h =-sum _ i hf (ih) log (f (ih)-sum hf (ih) log (h)
| |
− | | |
− | In the table below <math>\Gamma(x) = \int_0^{\infty} e^{-t} t^{x-1} dt</math> is the [[gamma function]], <math>\psi(x) = \frac{d}{dx} \ln\Gamma(x)=\frac{\Gamma'(x)}{\Gamma(x)}</math> is the [[digamma function]], <math>B(p,q) = \frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}</math> is the [[beta function]], and γ<sub>''E''</sub> is [[Euler-Mascheroni constant|Euler's constant]].<ref>{{cite journal |last1=Park |first1=Sung Y. |last2=Bera |first2=Anil K. |year=2009 |title=Maximum entropy autoregressive conditional heteroskedasticity model |journal=Journal of Econometrics |publisher=Elsevier |url=http://www.wise.xmu.edu.cn/Master/Download/..%5C..%5CUploadFiles%5Cpaper-masterdownload%5C2009519932327055475115776.pdf |accessdate=2011-06-02 |archive-url=https://web.archive.org/web/20160307144515/http://wise.xmu.edu.cn/uploadfiles/paper-masterdownload/2009519932327055475115776.pdf |archive-date=2016-03-07 |url-status=dead }}</ref>{{rp|219–230}} | |
− | | |
| {| class="wikitable" style="background:white" | | {| class="wikitable" style="background:white" |
− |
| |
− | The first term on the right approximates the differential entropy, while the second term is approximately <math>-\log(h)</math>. Note that this procedure suggests that the entropy in the discrete sense of a continuous random variable should be <math>\infty</math>.
| |
− |
| |
− | 右边的第一个术语近似于微分熵,而第二个术语近似于log(h)。请注意,这个过程表明,连续随机变量的离散意义上的熵应该是“无穷”。
| |
− |
| |
| |+ Table of differential entropies | | |+ Table of differential entropies |
− |
| |
| |- | | |- |
− |
| |
| ! Distribution Name !! Probability density function (pdf) !! Entropy in [[Nat (unit)|nat]]s || Support | | ! Distribution Name !! Probability density function (pdf) !! Entropy in [[Nat (unit)|nat]]s || Support |
− |
| |
| |- | | |- |
− |
| |
| | [[Uniform distribution (continuous)|Uniform]] || <math>f(x) = \frac{1}{b-a}</math> || <math>\ln(b - a) \,</math> ||<math>[a,b]\,</math> | | | [[Uniform distribution (continuous)|Uniform]] || <math>f(x) = \frac{1}{b-a}</math> || <math>\ln(b - a) \,</math> ||<math>[a,b]\,</math> |
− |
| |
| |- | | |- |
− |
| |
| | [[Normal distribution|Normal]] || <math>f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)</math> || <math>\ln\left(\sigma\sqrt{2\,\pi\,e}\right) </math>||<math>(-\infty,\infty)\,</math> | | | [[Normal distribution|Normal]] || <math>f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)</math> || <math>\ln\left(\sigma\sqrt{2\,\pi\,e}\right) </math>||<math>(-\infty,\infty)\,</math> |
− |
| |
| |- | | |- |
− |
| |
| | [[Exponential distribution|Exponential]] || <math>f(x) = \lambda \exp\left(-\lambda x\right)</math> || <math>1 - \ln \lambda \, </math>||<math>[0,\infty)\,</math> | | | [[Exponential distribution|Exponential]] || <math>f(x) = \lambda \exp\left(-\lambda x\right)</math> || <math>1 - \ln \lambda \, </math>||<math>[0,\infty)\,</math> |
− |
| |
| |- | | |- |
− |
| |
| | [[Rayleigh distribution|Rayleigh]] || <math>f(x) = \frac{x}{\sigma^2} \exp\left(-\frac{x^2}{2\sigma^2}\right)</math> || <math>1 + \ln \frac{\sigma}{\sqrt{2}} + \frac{\gamma_E}{2}</math>||<math>[0,\infty)\,</math> | | | [[Rayleigh distribution|Rayleigh]] || <math>f(x) = \frac{x}{\sigma^2} \exp\left(-\frac{x^2}{2\sigma^2}\right)</math> || <math>1 + \ln \frac{\sigma}{\sqrt{2}} + \frac{\gamma_E}{2}</math>||<math>[0,\infty)\,</math> |
− |
| |
| |- | | |- |
− |
| |
| | [[Beta distribution|Beta]] || <math>f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}</math> for <math>0 \leq x \leq 1</math> || <math> \ln B(\alpha,\beta) - (\alpha-1)[\psi(\alpha) - \psi(\alpha +\beta)]\,</math><br /><math>- (\beta-1)[\psi(\beta) - \psi(\alpha + \beta)] \, </math>||<math>[0,1]\,</math> | | | [[Beta distribution|Beta]] || <math>f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}</math> for <math>0 \leq x \leq 1</math> || <math> \ln B(\alpha,\beta) - (\alpha-1)[\psi(\alpha) - \psi(\alpha +\beta)]\,</math><br /><math>- (\beta-1)[\psi(\beta) - \psi(\alpha + \beta)] \, </math>||<math>[0,1]\,</math> |
− |
| |
| |- | | |- |
− |
| |
− | Category:Entropy and information
| |
− |
| |
− | 类别: 熵和信息
| |
− |
| |
| | [[Cauchy distribution|Cauchy]] || <math>f(x) = \frac{\gamma}{\pi} \frac{1}{\gamma^2 + x^2}</math> || <math>\ln(4\pi\gamma) \, </math>||<math>(-\infty,\infty)\,</math> | | | [[Cauchy distribution|Cauchy]] || <math>f(x) = \frac{\gamma}{\pi} \frac{1}{\gamma^2 + x^2}</math> || <math>\ln(4\pi\gamma) \, </math>||<math>(-\infty,\infty)\,</math> |
| + | |- |
| + | | [[Chi distribution|Chi]] || <math>f(x) = \frac{2}{2^{k/2} \Gamma(k/2)} x^{k-1} \exp\left(-\frac{x^2}{2}\right)</math> || <math>\ln{\frac{\Gamma(k/2)}{\sqrt{2}}} - \frac{k-1}{2} \psi\left(\frac{k}{2}\right) + \frac{k}{2}</math>||<math>[0,\infty)\,</math> |
| + | |- |
| + | | [[Chi-squared distribution|Chi-squared]] || <math>f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{\frac{k}{2}\!-\!1} \exp\left(-\frac{x}{2}\right)</math> || <math>\ln 2\Gamma\left(\frac{k}{2}\right) - \left(1 - \frac{k}{2}\right)\psi\left(\frac{k}{2}\right) + \frac{k}{2}</math>||<math>[0,\infty)\,</math> |
| + | |- |
| + | | [[Erlang distribution|Erlang]] || <math>f(x) = \frac{\lambda^k}{(k-1)!} x^{k-1} \exp(-\lambda x)</math> || <math>(1-k)\psi(k) + \ln \frac{\Gamma(k)}{\lambda} + k</math>||<math>[0,\infty)\,</math> |
| + | |- |
| + | | [[F distribution|F]] || <math>f(x) = \frac{n_1^{\frac{n_1}{2}} n_2^{\frac{n_2}{2}}}{B(\frac{n_1}{2},\frac{n_2}{2})} \frac{x^{\frac{n_1}{2} - 1}}{(n_2 + n_1 x)^{\frac{n_1 + n2}{2}}}</math> || <math>\ln \frac{n_1}{n_2} B\left(\frac{n_1}{2},\frac{n_2}{2}\right) + \left(1 - \frac{n_1}{2}\right) \psi\left(\frac{n_1}{2}\right) -</math><br /><math>\left(1 + \frac{n_2}{2}\right)\psi\left(\frac{n_2}{2}\right) + \frac{n_1 + n_2}{2} \psi\left(\frac{n_1\!+\!n_2}{2}\right)</math>||<math>[0,\infty)\,</math> |
| + | |- |
| + | | [[Gamma distribution|Gamma]] || <math>f(x) = \frac{x^{k - 1} \exp(-\frac{x}{\theta})}{\theta^k \Gamma(k)}</math> || <math>\ln(\theta \Gamma(k)) + (1 - k)\psi(k) + k \, </math>||<math>[0,\infty)\,</math> |
| + | |- |
| + | | [[Laplace distribution|Laplace]] || <math>f(x) = \frac{1}{2b} \exp\left(-\frac{|x - \mu|}{b}\right)</math> || <math>1 + \ln(2b) \, </math>||<math>(-\infty,\infty)\,</math> |
| + | |- |
| + | | [[Logistic distribution|Logistic]] || <math>f(x) = \frac{e^{-x}}{(1 + e^{-x})^2}</math> || <math>2 \, </math>||<math>(-\infty,\infty)\,</math> |
| + | |- |
| + | | [[Log-normal distribution|Lognormal]] || <math>f(x) = \frac{1}{\sigma x \sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right)</math> || <math>\mu + \frac{1}{2} \ln(2\pi e \sigma^2)</math>||<math>[0,\infty)\,</math> |
| + | |- |
| + | | [[Maxwell–Boltzmann distribution|Maxwell–Boltzmann]] || <math>f(x) = \frac{1}{a^3}\sqrt{\frac{2}{\pi}}\,x^{2}\exp\left(-\frac{x^2}{2a^2}\right)</math> || <math>\ln(a\sqrt{2\pi})+\gamma_E-\frac{1}{2}</math>||<math>[0,\infty)\,</math> |
| + | |- |
| + | | [[Generalized Gaussian distribution|Generalized normal]] || <math>f(x) = \frac{2 \beta^{\frac{\alpha}{2}}}{\Gamma(\frac{\alpha}{2})} x^{\alpha - 1} \exp(-\beta x^2)</math> || <math>\ln{\frac{\Gamma(\alpha/2)}{2\beta^{\frac{1}{2}}}} - \frac{\alpha - 1}{2} \psi\left(\frac{\alpha}{2}\right) + \frac{\alpha}{2}</math>||<math>(-\infty,\infty)\,</math> |
| + | |- |
| + | | [[Pareto distribution|Pareto]] || <math>f(x) = \frac{\alpha x_m^\alpha}{x^{\alpha+1}}</math> || <math>\ln \frac{x_m}{\alpha} + 1 + \frac{1}{\alpha}</math>||<math>[x_m,\infty)\,</math> |
| + | |- |
| + | | [[Student's t-distribution|Student's t]] || <math>f(x) = \frac{(1 + x^2/\nu)^{-\frac{\nu+1}{2}}}{\sqrt{\nu}B(\frac{1}{2},\frac{\nu}{2})}</math> || <math>\frac{\nu\!+\!1}{2}\left(\psi\left(\frac{\nu\!+\!1}{2}\right)\!-\!\psi\left(\frac{\nu}{2}\right)\right)\!+\!\ln \sqrt{\nu} B\left(\frac{1}{2},\frac{\nu}{2}\right)</math>||<math>(-\infty,\infty)\,</math> |
| + | |- |
| + | | [[Triangular distribution|Triangular]] || <math> f(x) = \begin{cases} |
| + | \frac{2(x-a)}{(b-a)(c-a)} & \mathrm{for\ } a \le x \leq c, \\[4pt] |
| + | \frac{2(b-x)}{(b-a)(b-c)} & \mathrm{for\ } c < x \le b, \\[4pt] |
| + | \end{cases}</math> || <math>\frac{1}{2} + \ln \frac{b-a}{2}</math>||<math>[0,1]\,</math> |
| + | |- |
| + | | [[Weibull distribution|Weibull]] || <math>f(x) = \frac{k}{\lambda^k} x^{k-1} \exp\left(-\frac{x^k}{\lambda^k}\right)</math> || <math>\frac{(k-1)\gamma_E}{k} + \ln \frac{\lambda}{k} + 1</math>||<math>[0,\infty)\,</math> |
| + | |- |
| + | | [[Multivariate normal distribution|Multivariate normal]] || <math> |
| + | f_X(\vec{x}) =</math><br /><math> \frac{\exp \left( -\frac{1}{2} ( \vec{x} - \vec{\mu})^\top \Sigma^{-1}\cdot(\vec{x} - \vec{\mu}) \right)} {(2\pi)^{N/2} \left|\Sigma\right|^{1/2}}</math> || <math>\frac{1}{2}\ln\{(2\pi e)^{N} \det(\Sigma)\}</math>||<math>\mathbb{R}^N</math> |
| + | |- |
| + | |} |
| | | |
− | Category:Information theory
| + | Many of the differential entropies are from.<ref name="lazorathie">{{cite journal|author=Lazo, A. and P. Rathie|title=On the entropy of continuous probability distributions|journal=IEEE Transactions on Information Theory|year=1978|volume=24 |issue=1|doi=10.1109/TIT.1978.1055832|pages=120–122}}</ref>{{rp|120–122}} |
− | | |
− | 范畴: 信息论
| |
− | | |
− | |- | |
| | | |
− | Category:Statistical randomness
| |
| | | |
− | 分类: 统计的随机性
| |
| | | |
− | <noinclude>
| + | --------- |
| | | |
− | <small>This page was moved from [[wikipedia:en:Differential entropy]]. Its edit history can be viewed at [[微分熵/edithistory]]</small></noinclude>
| + | [[Category:熵和信息]] |
| | | |
− | [[Category:待整理页面]] | + | [[Category:信息论]] |
| + | [[Category:统计的随机性]] |