概率密度函数

此词条暂由彩云小译翻译,翻译字数共4686,未经人工整理和审校,带来阅读不便,请见谅。

此词条由天天审校,未经专家审校,带来阅读不便,请见谅。

模板:Use American English

文件:Boxplot vs PDF.svg
Boxplot and probability density function of a normal distribution N(0, σ2). 正态分布的箱线图和概率密度函数 N(0, σ2)

In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.[1] In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0 (since there are an infinite set of possible values to begin with), the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would equal one sample compared to the other sample.

In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0 (since there are an infinite set of possible values to begin with), the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would equal one sample compared to the other sample.

概率论中,概率密度函数(PDF)连续型随机变量 continuous random variable的密度是一个函数,其在样本空间(随机变量可能取值的集合)中任何给定样本(或点)的值,可以被解释为,提供了一个随机变量的值等于该样本的相对可能性。[1] 换句话说,虽然连续型随机变量取任何特定值的绝对可能性是0(因为一开始就有一组无限的可能值),但可以使用在两个不同的样本上的PDF值进行推断:在随机变量的任何特定抽样中,随机变量等于一个样本的可能性比另一个样本大多少。

In a more precise sense, the PDF is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value. This probability is given by the integral of this variable's PDF over that range—that is, it is given by the area under the density function but above the horizontal axis and between the lowest and greatest values of the range. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to 1.

In a more precise sense, the PDF is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value. This probability is given by the integral of this variable's PDF over that range—that is, it is given by the area under the density function but above the horizontal axis and between the lowest and greatest values of the range. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to 1.

在更精确的意义上,PDF是用来指定随机变量落在特定数值范围内的概率,而不是指定任何一个数值。这个概率是由这个变量的PDF在该范围内的积分给出的——也就是说,它是由密度函数下的面积给出的,其在在横轴上方和范围内的最低值和最大值之间。概率密度函数在任何地方都是非负的,而且它在整个空间的积分等于1。

The terms "probability distribution function"[2] and "probability function"[3] have also sometimes been used to denote the probability density function. However, this use is not standard among probabilists and statisticians. In other sources, "probability distribution function" may be used when the probability distribution is defined as a function over general sets of values or it may refer to the cumulative distribution function, or it may be a probability mass function (PMF) rather than the density. "Density function" itself is also used for the probability mass function, leading to further confusion.[4] In general though, the PMF is used in the context of discrete random variables (random variables that take values on a countable set), while the PDF is used in the context of continuous random variables.

The terms "probability distribution function" and "probability function" have also sometimes been used to denote the probability density function. However, this use is not standard among probabilists and statisticians. In other sources, "probability distribution function" may be used when the probability distribution is defined as a function over general sets of values or it may refer to the cumulative distribution function, or it may be a probability mass function (PMF) rather than the density. "Density function" itself is also used for the probability mass function, leading to further confusion. In general though, the PMF is used in the context of discrete random variables (random variables that take values on a countable set), while the PDF is used in the context of continuous random variables.

"概率分布函数"[2] 和 "概率函数"[3] 两个词有时也被用来表示概率密度函数。然而,这种用法在概率论统计学领域中并不标准。在其他资料中,当概率分布被定义为一般数值集上的函数时,可以使用 "概率分布函数"这个词,或者它指的也可以是累积分布函数,或者它可以是概率质量函数(PMF)而不是密度。而"密度函数"本身也被用于概率质量函数,这导致了进一步的混淆。[4] 不过一般来说,PMF是在离散型随机变量(在可数集上取值的随机变量)的背景下使用的,而PDF是在连续型随机变量的背景下使用的。

Example 示例

Suppose bacteria of a certain species typically live 4 to 6 hours. The probability that a bacterium lives 模板:Em 5 hours is equal to zero. A lot of bacteria live for approximately 5 hours, but there is no chance that any given bacterium dies at exactly 5.0000000000... hours. However, the probability that the bacterium dies between 5 hours and 5.01 hours is quantifiable. Suppose the answer is 0.02 (i.e., 2%). Then, the probability that the bacterium dies between 5 hours and 5.001 hours should be about 0.002, since this time interval is one-tenth as long as the previous. The probability that the bacterium dies between 5 hours and 5.0001 hours should be about 0.0002, and so on.

Suppose bacteria of a certain species typically live 4 to 6 hours. The probability that a bacterium lives 5 hours is equal to zero. A lot of bacteria live for approximately 5 hours, but there is no chance that any given bacterium dies at exactly 5.0000000000... hours. However, the probability that the bacterium dies between 5 hours and 5.01 hours is quantifiable. Suppose the answer is 0.02 (i.e., 2%). Then, the probability that the bacterium dies between 5 hours and 5.001 hours should be about 0.002, since this time interval is one-tenth as long as the previous. The probability that the bacterium dies between 5 hours and 5.0001 hours should be about 0.0002, and so on.

假设某种细菌通常能够存活4到6个小时。一个细菌存活5个小时的概率等于零。很多细菌的寿命大约为5个小时,但没有任何一个特定的细菌能够正好在5.0000000000...个小时时死亡。然而,该细菌在5个小时和5.01个小时之间死亡的概率是可以量化的。假设答案是0.02(即2%)。那么,该细菌在5个小时和5.001个小时之间死亡的概率应该是0.002左右,因为这个时间间隔是前一个时间间隔的十分之一。该细菌在5个小时和5.0001个小时之间死亡的概率应该是0.0002,以此类推。

In these three examples, the ratio (probability of dying during an interval) / (duration of the interval) is approximately constant, and equal to 2 per hour (or 2 hour−1). For example, there is 0.02 probability of dying in the 0.01-hour interval between 5 and 5.01 hours, and (0.02 probability / 0.01 hours) = 2 hour−1. This quantity 2 hour−1 is called the probability density for dying at around 5 hours. Therefore, the probability that the bacterium dies at 5 hours can be written as (2 hour−1) dt. This is the probability that the bacterium dies within an infinitesimal window of time around 5 hours, where dt is the duration of this window. For example, the probability that it lives longer than 5 hours, but shorter than (5 hours + 1 nanosecond), is (2 hour−1)×(1 nanosecond) ≈ 模板:Val (using the unit conversion 模板:Val nanoseconds = 1 hour).

In these three examples, the ratio (probability of dying during an interval) / (duration of the interval) is approximately constant, and equal to 2 per hour (or 2 hour−1). For example, there is 0.02 probability of dying in the 0.01-hour interval between 5 and 5.01 hours, and (0.02 probability / 0.01 hours) = 2 hour−1. This quantity 2 hour−1 is called the probability density for dying at around 5 hours. Therefore, the probability that the bacterium dies at 5 hours can be written as (2 hour−1) dt. This is the probability that the bacterium dies within an infinitesimal window of time around 5 hours, where dt is the duration of this window. For example, the probability that it lives longer than 5 hours, but shorter than (5 hours + 1 nanosecond), is (2 hour−1)×(1 nanosecond) ≈ (using the unit conversion nanoseconds = 1 hour).

在这三个例子中,(在某一间隔期内死亡的概率)/(间隔期的持续时间)的比率大约是恒定的,并且等于2/小时(或2小时−1)。例如,在5和5.01个小时之间的0.01个小时的间隔中,有0.02的死亡概率,(0.02概率/0.01个小时)=2小时−1。这个2小时−1的量被称为在5小时左右死亡的概率密度。因此,该细菌在5小时内死亡的概率可以写成(2小时−1)dt。这是细菌在5小时左右的一个无限小的时间区间内死亡的概率,其中dt是这个区间的时间。例如,它的寿命长于5小时,但短于(5小时+1纳秒)的概率是(2小时−1)×(1纳秒)≈6×10−13(使用单位换算:3.6×1012 纳秒 = 1 小时)。

There is a probability density function f with f(5 hours) = 2 hour−1. The integral of f over any window of time (not only infinitesimal windows but also large windows) is the probability that the bacterium dies in that window.

There is a probability density function f with f(5 hours) = 2 hour−1. The integral of f over any window of time (not only infinitesimal windows but also large windows) is the probability that the bacterium dies in that window.

有一个概率密度函数 ff (5小时)=2小时−1f 在任何时间区间(不仅是无限小的区间,而且也可以是大的区间)的积分是细菌在该窗口中死亡的概率。

Absolutely continuous univariate distributions 绝对连续的单变量分布

A probability density function is most commonly associated with absolutely continuous univariate distributions. A random variable [math]\displaystyle{ X }[/math] has density [math]\displaystyle{ f_X }[/math], where [math]\displaystyle{ f_X }[/math] is a non-negative Lebesgue-integrable function, if:

A probability density function is most commonly associated with absolutely continuous univariate distributions. A random variable [math]\displaystyle{ X }[/math] has density [math]\displaystyle{ f_X }[/math], where [math]\displaystyle{ f_X }[/math] is a non-negative Lebesgue-integrable function, if:

概率密度函数通常与绝对连续的单变量分布联系在一起。一个随机变量 [math]\displaystyle{ X }[/math]具有密度[math]\displaystyle{ f_X }[/math],其中[math]\displaystyle{ f_X }[/math]是一个非负的勒贝格可积函数(Lebesgue-integrable function),如果:


[math]\displaystyle{ \Pr [a \le X \le b] = \int_a^b f_X(x) \, dx . }[/math]

Hence, if [math]\displaystyle{ F_X }[/math] is the cumulative distribution function of [math]\displaystyle{ X }[/math], then:

Hence, if [math]\displaystyle{ F_X }[/math] is the cumulative distribution function of [math]\displaystyle{ X }[/math], then:

因此,如果[math]\displaystyle{ F_X }[/math][math]\displaystyle{ X }[/math] 的累积分布函数,那么:

[math]\displaystyle{ F_X(x) = \int_{-\infty}^x f_X(u) \, du , }[/math]

and (if [math]\displaystyle{ f_X }[/math] is continuous at [math]\displaystyle{ x }[/math])

and (if [math]\displaystyle{ f_X }[/math] is continuous at [math]\displaystyle{ x }[/math])

而且 (如果[math]\displaystyle{ f_X }[/math][math]\displaystyle{ x }[/math]上是连续的):

[math]\displaystyle{ f_X(x) = \frac{d}{dx} F_X(x) . }[/math]

Intuitively, one can think of [math]\displaystyle{ f_X(x) \, dx }[/math] as being the probability of [math]\displaystyle{ X }[/math] falling within the infinitesimal interval [math]\displaystyle{ [x,x+dx] }[/math].

Intuitively, one can think of [math]\displaystyle{ f_X(x) \, dx }[/math] as being the probability of [math]\displaystyle{ X }[/math] falling within the infinitesimal interval [math]\displaystyle{ [x,x+dx] }[/math].

直观地说,我们可以把[math]\displaystyle{ f_X(x) \, dx }[/math]看作是 [math]\displaystyle{ X }[/math]落入无限小区间[math]\displaystyle{ [x,x+dx] }[/math]的概率。

Formal definition 正式定义

(This definition may be extended to any probability distribution using the measure-theoretic definition of probability.)

(This definition may be extended to any probability distribution using the measure-theoretic definition of probability.)

(这个定义可以借助概率的测度理论定义扩展到任何概率分布。)

A random variable [math]\displaystyle{ X }[/math] with values in a measurable space [math]\displaystyle{ (\mathcal{X}, \mathcal{A}) }[/math] (usually [math]\displaystyle{ \mathbb{R}^n }[/math] with the Borel sets as measurable subsets) has as probability distribution the measure XP on [math]\displaystyle{ (\mathcal{X}, \mathcal{A}) }[/math]: the density of [math]\displaystyle{ X }[/math] with respect to a reference measure [math]\displaystyle{ \mu }[/math] on [math]\displaystyle{ (\mathcal{X}, \mathcal{A}) }[/math] is the Radon–Nikodym derivative:

A random variable [math]\displaystyle{ X }[/math] with values in a measurable space [math]\displaystyle{ (\mathcal{X}, \mathcal{A}) }[/math] (usually [math]\displaystyle{ \mathbb{R}^n }[/math] with the Borel sets as measurable subsets) has as probability distribution the measure XP on [math]\displaystyle{ (\mathcal{X}, \mathcal{A}) }[/math]: the density of [math]\displaystyle{ X }[/math] with respect to a reference measure [math]\displaystyle{ \mu }[/math] on [math]\displaystyle{ (\mathcal{X}, \mathcal{A}) }[/math] is the Radon–Nikodym derivative:

一个随机变量[math]\displaystyle{ X }[/math],其在一个可测量空间[math]\displaystyle{ (\mathcal{X}, \mathcal{A}) }[/math]有值(通常[math]\displaystyle{ \mathbb{R}^n }[/math]将Borel集作为可测的子集),并且在[math]\displaystyle{ (\mathcal{X}, \mathcal{A}) }[/math] 上具有度量XP 上的概率分布 :[math]\displaystyle{ X }[/math]的密度是[math]\displaystyle{ \mu }[/math][math]\displaystyle{ (\mathcal{X}, \mathcal{A}) }[/math]的拉东-尼科迪姆导数(Radon-Nikodym derivative):

[math]\displaystyle{ f = \frac{d X_*P}{d \mu} . }[/math]

That is, f is any measurable function with the property that:

That is, f is any measurable function with the property that:

也就是说,f 是任意一个具有以下属性的可测函数:

[math]\displaystyle{ \Pr [X \in A ] = \int_{X^{-1}A} \, d P = \int_A f \, d \mu }[/math]


for any measurable set [math]\displaystyle{ A \in \mathcal{A}. }[/math]

对于任何可测的集合[math]\displaystyle{ A \in \mathcal{A} }[/math]


Discussion 讨论

In the continuous univariate case above, the reference measure is the Lebesgue measure. The probability mass function of a discrete random variable is the density with respect to the counting measure over the sample space (usually the set of integers, or some subset thereof).

In the continuous univariate case above, the reference measure is the Lebesgue measure. The probability mass function of a discrete random variable is the density with respect to the counting measure over the sample space (usually the set of integers, or some subset thereof).

在上述连续单变量的情况下,参考度量是勒贝格测度 Lebesgue measure。离散型随机变量的概率质量函数是样本空间(通常是整数集,或其某个子集)上的计数测度的密度。

It is not possible to define a density with reference to an arbitrary measure (e.g. one can't choose the counting measure as a reference for a continuous random variable). Furthermore, when it does exist, the density is almost everywhere unique.

It is not possible to define a density with reference to an arbitrary measure (e.g. one can't choose the counting measure as a reference for a continuous random variable). Furthermore, when it does exist, the density is almost everywhere unique.

不可能参照任意的一个测度来定义密度(例如,不能选择计数测度作为连续型随机变量的参考)。此外,当它确实存在时,密度几乎在任何地方都是唯一的。

Further details 更多细节

Unlike a probability, a probability density function can take on values greater than one; for example, the uniform distribution on the interval [0, ½] has probability density f(x) = 2 for 0 ≤ x ≤ ½ and f(x) = 0 elsewhere.

Unlike a probability, a probability density function can take on values greater than one; for example, the uniform distribution on the interval [0, ½] has probability density f(x) = 2 for 0 ≤ x ≤ ½ and f(x) = 0 elsewhere.

与概率不同,概率密度函数可以取大于1的值;例如,在区间[0, ½]上的均匀分布,在0≤x≤½时,概率密度f(x)=2,在其他地方 f(x) = 0 。

The standard normal distribution has probability density

The standard normal distribution has probability density

标准正态分布的概率密度为

[math]\displaystyle{ f(x) = \frac{1}{\sqrt{2\pi}}\, e^{-x^2/2}. }[/math]

If a random variable X is given and its distribution admits a probability density function f, then the expected value of X (if the expected value exists) can be calculated as

If a random variable X is given and its distribution admits a probability density function f, then the expected value of X (if the expected value exists) can be calculated as

如果给定一个随机变量X ,并且其分布允许存在一个概率密度函数f,那么X 的期望值(如果期望值存在)可以被计算为:

[math]\displaystyle{ \operatorname{E}[X] = \int_{-\infty}^\infty x\,f(x)\,dx. }[/math]

Not every probability distribution has a density function: the distributions of discrete random variables do not; nor does the Cantor distribution, even though it has no discrete component, i.e., does not assign positive probability to any individual point.

Not every probability distribution has a density function: the distributions of discrete random variables do not; nor does the Cantor distribution, even though it has no discrete component, i.e., does not assign positive probability to any individual point.

并非每个概率分布都有密度函数:离散随机变量的分布没有;康托尔分布(Cantor distribution)也没有,即使它没有离散的部分,也即任何单个点都没有正概率。

A distribution has a density function if and only if its cumulative distribution function F(x) is absolutely continuous. In this case: F is almost everywhere differentiable, and its derivative can be used as probability density:

A distribution has a density function if and only if its cumulative distribution function F(x) is absolutely continuous. In this case: F is almost everywhere differentiable, and its derivative can be used as probability density:

一个分布有一个密度函数,当且仅当其累积分布函数F(x)是绝对连续的。在这种情况下。F几乎处处可微,其导数可作为概率密度使用:

[math]\displaystyle{ \frac{d}{dx}F(x) = f(x). }[/math]


If a probability distribution admits a density, then the probability of every one-point set {a} is zero; the same holds for finite and countable sets.

If a probability distribution admits a density, then the probability of every one-point set {a} is zero; the same holds for finite and countable sets.

如果一个概率分布有一个密度,那么每一个单点集 {a} 的概率都是零;对于有限集和可数集也是如此。

Two probability densities f and g represent the same probability distribution precisely if they differ only on a set of Lebesgue measure zero.

Two probability densities f and g represent the same probability distribution precisely if they differ only on a set of Lebesgue measure zero.

如果两个概率密度fg只在勒贝格测度(Lebesgue measure)为零的集合上有差异,那么它们恰好代表了同一个概率分布。

In the field of statistical physics, a non-formal reformulation of the relation above between the derivative of the cumulative distribution function and the probability density function is generally used as the definition of the probability density function. This alternate definition is the following:

In the field of statistical physics, a non-formal reformulation of the relation above between the derivative of the cumulative distribution function and the probability density function is generally used as the definition of the probability density function. This alternate definition is the following:

在统计物理学领域,一般采用上述累积分布函数的导数与概率密度函数之间关系的非形式化重新表述,作为概率密度函数的定义。这种替代定义如下:

If dt is an infinitely small number, the probability that X is included within the interval (tt + dt) is equal to f(tdt, or:

If dt is an infinitely small number, the probability that X is included within the interval (t, t + dt) is equal to f(t) dt, or:

如果 dt 是一个无穷小的数,则 X 在区间(tt + dt) 中的概率等于 f(tdt,或:

[math]\displaystyle{ \Pr(t\lt X\lt t+dt) = f(t)\,dt. }[/math]

Link between discrete and continuous distributions 离散和连续分布之间的联系

It is possible to represent certain discrete random variables as well as random variables involving both a continuous and a discrete part with a generalized probability density function, by using the Dirac delta function. (This is not possible with a probability density function in the sense defined above, it may be done with a distribution.) For example, consider a binary discrete random variable having the Rademacher distribution—that is, taking −1 or 1 for values, with probability ½ each. The density of probability associated with this variable is:

It is possible to represent certain discrete random variables as well as random variables involving both a continuous and a discrete part with a generalized probability density function, by using the Dirac delta function. (This is not possible with a probability density function in the sense defined above, it may be done with a distribution.) For example, consider a binary discrete random variable having the Rademacher distribution—that is, taking −1 or 1 for values, with probability ½ each. The density of probability associated with this variable is:

通过使用狄拉克δ函数 Dirac delta function,可以用广义的概率密度函数表示某些离散的随机变量以及涉及连续和离散部分的随机变量。(这不可以用上面定义的概率密度函数,它可以用分布来定义)。例如,考虑一个具有拉德马赫分布(Rademacher distribution)的二元离散随机变量——即取-1或1的值,每个值概率为½。与这个变量相关的概率密度是:

[math]\displaystyle{ f(t) = \frac{1}{2}(\delta(t+1)+\delta(t-1)). }[/math]

More generally, if a discrete variable can take n different values among real numbers, then the associated probability density function is:

More generally, if a discrete variable can take n different values among real numbers, then the associated probability density function is:

更一般地说,如果一个离散变量可以在实数中取n个不同的值,那么相关的概率密度函数就是:

[math]\displaystyle{ f(t) = \sum_{i=1}^np_i\, \delta(t-x_i), }[/math]



where [math]\displaystyle{ x_1\ldots,x_n }[/math] are the discrete values accessible to the variable and [math]\displaystyle{ p_1,\ldots,p_n }[/math] are the probabilities associated with these values.

where [math]\displaystyle{ x_1\ldots,x_n }[/math] are the discrete values accessible to the variable and [math]\displaystyle{ p_1,\ldots,p_n }[/math] are the probabilities associated with these values.

其中,[math]\displaystyle{ x_1\ldots,x_n }[/math]是变量可以取到的离散值,而 [math]\displaystyle{ p_1,\ldots,p_n }[/math]是与这些值相关的概率。


This substantially unifies the treatment of discrete and continuous probability distributions. For instance, the above expression allows for determining statistical characteristics of such a discrete variable (such as its mean, its variance and its kurtosis), starting from the formulas given for a continuous distribution of the probability.

This substantially unifies the treatment of discrete and continuous probability distributions. For instance, the above expression allows for determining statistical characteristics of such a discrete variable (such as its mean, its variance and its kurtosis), starting from the formulas given for a continuous distribution of the probability.

这大大方便了离散和连续概率分布之间的转换与处理。例如,上述表达式可以从给定的概率连续分布的公式出发,确定离散变量的统计特征(如其平均值、方差和峰度)。

Families of densities 密度族

It is common for probability density functions (and probability mass functions) to be parametrized—that is, to be characterized by unspecified parameters. For example, the normal distribution is parametrized in terms of the mean and the variance, denoted by [math]\displaystyle{ \mu }[/math] and [math]\displaystyle{ \sigma^2 }[/math] respectively, giving the family of densities.

It is common for probability density functions (and probability mass functions) to be parametrized—that is, to be characterized by unspecified parameters. For example, the normal distribution is parametrized in terms of the mean and the variance, denoted by [math]\displaystyle{ \mu }[/math] and [math]\displaystyle{ \sigma^2 }[/math] respectively, giving the family of densities.


概率密度函数(和概率质量函数)被参数化是很常见的,也就是说,它们会被未指定的参数所描述。例如,正态分布是以均值和方差为参数的,分别用[math]\displaystyle{ \mu }[/math][math]\displaystyle{ \sigma^2 }[/math]表示,给出密度族的定义:

[math]\displaystyle{ f(x;\mu,\sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 }. }[/math]


It is important to keep in mind the difference between the domain of a family of densities and the parameters of the family. Different values of the parameters describe different distributions of different random variables on the same sample space (the same set of all possible values of the variable); this sample space is the domain of the family of random variables that this family of distributions describes. A given set of parameters describes a single distribution within the family sharing the functional form of the density. From the perspective of a given distribution, the parameters are constants, and terms in a density function that contain only parameters, but not variables, are part of the normalization factor of a distribution (the multiplicative factor that ensures that the area under the density—the probability of something in the domain occurring— equals 1). This normalization factor is outside the kernel of the distribution.

It is important to keep in mind the difference between the domain of a family of densities and the parameters of the family. Different values of the parameters describe different distributions of different random variables on the same sample space (the same set of all possible values of the variable); this sample space is the domain of the family of random variables that this family of distributions describes. A given set of parameters describes a single distribution within the family sharing the functional form of the density. From the perspective of a given distribution, the parameters are constants, and terms in a density function that contain only parameters, but not variables, are part of the normalization factor of a distribution (the multiplicative factor that ensures that the area under the density—the probability of something in the domain occurring— equals 1). This normalization factor is outside the kernel of the distribution.

必须要牢记密度族的域和该族的参数之间的区别。参数的不同值描述了不同随机变量在同一样本空间(变量的所有可能值的同一集合)上的不同分布;这个样本空间就是这个分布族所描述的随机变量族的域。一组给定的参数描述了该族中共享密度函数形式的一个分布。从给定分布的角度看,参数是常数,而密度函数中只包含参数而不包含变量的项,是分布的归一化因子的一部分(确保密度下的面积——域中某物发生的概率——等于1的乘法因子)。这个归一化因子在分布的内核之外。

Since the parameters are constants, reparametrizing a density in terms of different parameters, to give a characterization of a different random variable in the family, means simply substituting the new parameter values into the formula in place of the old ones. Changing the domain of a probability density, however, is trickier and requires more work: see the section below on change of variables.

Since the parameters are constants, reparametrizing a density in terms of different parameters, to give a characterization of a different random variable in the family, means simply substituting the new parameter values into the formula in place of the old ones. Changing the domain of a probability density, however, is trickier and requires more work: see the section below on change of variables.

由于这些参数是常数,所以用不同的参数对密度进行重构,可以给出族中不同随机变量的特征,这也意味着可以简单地将新的参数值代入公式以取代旧的参数值。然而,改变概率密度的域是比较棘手的,需要做更多的工作:见下面关于改变变量的部分。

Densities associated with multiple variables 与多个变量相关的密度

For continuous random variables X1, ..., Xn, it is also possible to define a probability density function associated to the set as a whole, often called joint probability density function. This density function is defined as a function of the n variables, such that, for any domain D in the n-dimensional space of the values of the variables X1, ..., Xn, the probability that a realisation of the set variables falls inside the domain D is

For continuous random variables X1, ..., Xn, it is also possible to define a probability density function associated to the set as a whole, often called joint probability density function. This density function is defined as a function of the n variables, such that, for any domain D in the n-dimensional space of the values of the variables X1, ..., Xn, the probability that a realisation of the set variables falls inside the domain D is

对于连续随机变量X1, ..., Xn,也可以定义一个与整个集合相关的概率密度函数,通常称为联合概率密度函数 joint probability density function。这个密度函数被定义为n 个变量的函数,这样,对于变量X1, ..., Xn的值的n 维空间中的任何域D,集合变量的实现落在域D 内的概率为

[math]\displaystyle{ \Pr \left( X_1,\ldots,X_n \isin D \right) = \int_D f_{X_1,\ldots,X_n}(x_1,\ldots,x_n)\,dx_1 \cdots dx_n. }[/math]

If F(x1, ..., xn) = Pr(X1 ≤ x1, ..., Xn ≤ xn) is the cumulative distribution function of the vector (X1, ..., Xn), then the joint probability density function can be computed as a partial derivative

If F(x1, ..., xn) = Pr(X1 ≤ x1, ..., Xn ≤ xn) is the cumulative distribution function of the vector (X1, ..., Xn), then the joint probability density function can be computed as a partial derivative

如果F(x1, ..., xn) = Pr(X1 ≤ x1, ..., Xn ≤ xn) 是矢量(X1, ..., Xn)的累积分布函数,那么联合概率密度函数可以被计算为偏导数

[math]\displaystyle{ f(x) = \left.\frac{\partial^n F}{\partial x_1 \cdots \partial x_n} \right|_x }[/math]

Marginal densities 边际密度

For i = 1, 2, ...,n, let fXi(xi) be the probability density function associated with variable Xi alone. This is called the marginal density function, and can be deduced from the probability density associated with the random variables X1, ..., Xn by integrating over all values of the other n − 1 variables:

For i = 1, 2, ...,n, let fXi(xi) be the probability density function associated with variable Xi alone. This is called the marginal density function, and can be deduced from the probability density associated with the random variables X1, ..., Xn by integrating over all values of the other n − 1 variables:

对于i = 1, 2, ...,n,让fXi(xi) 是与变量Xi 单独相关的概率密度函数。这被称为边际密度函数,可以从与随机变量X1, ..., Xn 相关的概率密度中推导出来,方法是对其他n − 1个变量的所有数值进行积分:

[math]\displaystyle{ f_{X_i}(x_i) = \int f(x_1,\ldots,x_n)\, dx_1 \cdots dx_{i-1}\,dx_{i+1}\cdots dx_n . }[/math]



Independence 独立性

Continuous random variables X1, ..., Xn admitting a joint density are all independent from each other if and only if

Continuous random variables X1, ..., Xn admitting a joint density are all independent from each other if and only if

具备联合密度的连续随机变量X1, ..., Xn 都是相互独立的,当且仅当

[math]\displaystyle{ f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = f_{X_1}(x_1)\cdots f_{X_n}(x_n). }[/math]

Corollary 推论

If the joint probability density function of a vector of n random variables can be factored into a product of n functions of one variable

If the joint probability density function of a vector of n random variables can be factored into a product of n functions of one variable

如果一个由n个随机变量组成的矢量的联合概率密度函数,可以被分解为一个变量的n个函数的乘积:

[math]\displaystyle{ f_{X_1,\ldots,X_n}(x_1,\ldots,x_n) = f_1(x_1)\cdots f_n(x_n), }[/math]



(where each fi is not necessarily a density) then the n variables in the set are all independent from each other, and the marginal probability density function of each of them is given by

(where each fi is not necessarily a density) then the n variables in the set are all independent from each other, and the marginal probability density function of each of them is given by

(其中每个 fi 不一定是密度),那么集合中的n 个变量都是相互独立的,其中每个变量的边际概率密度函数为:

[math]\displaystyle{ f_{X_i}(x_i) = \frac{f_i(x_i)}{\int f_i(x)\,dx}. }[/math]


Example 示例

This elementary example illustrates the above definition of multidimensional probability density functions in the simple case of a function of a set of two variables. Let us call [math]\displaystyle{ \vec R }[/math] a 2-dimensional random vector of coordinates (X, Y): the probability to obtain [math]\displaystyle{ \vec R }[/math] in the quarter plane of positive x and y is

This elementary example illustrates the above definition of multidimensional probability density functions in the simple case of a function of a set of two variables. Let us call [math]\displaystyle{ \vec R }[/math] a 2-dimensional random vector of coordinates (X, Y): the probability to obtain [math]\displaystyle{ \vec R }[/math] in the quarter plane of positive x and y is

这个基本的例子说明了在两个变量集合的函数的简单情况下,上述多维概率密度函数的定义。让我们称[math]\displaystyle{ \vec R }[/math]为坐标(X, Y)的二维随机矢量:在正xy 的四分之一平面内取得[math]\displaystyle{ \vec R }[/math]的概率是:


[math]\displaystyle{ \lt math display="block"\gt \Pr \left( X \gt 0, Y \gt 0 \right) = \int_0^\infty \int_0^\infty f_{X,Y}(x,y)\,dx\,dy.</math\gt }[/math]

Function of random variables and change of variables in the probability density function 随机变量的函数和概率密度函数中的变量变化

If the probability density function of a random variable (or vector) X is given as fX(x), it is possible (but often not necessary; see below) to calculate the probability density function of some variable Y = g(X). This is also called a “change of variable” and is in practice used to generate a random variable of arbitrary shape fg(X) = fY using a known (for instance, uniform) random number generator.

If the probability density function of a random variable (or vector) X is given as fX(x), it is possible (but often not necessary; see below) to calculate the probability density function of some variable g(X)}}. This is also called a “change of variable” and is in practice used to generate a random variable of arbitrary shape fY}} using a known (for instance, uniform) random number generator.

如果一个随机变量(或矢量)X 的概率密度函数被给定为fX(x) ,那么就有可能(但往往没有必要,见下文)计算出某个变量Y = g(X)的概率密度函数。这也被称为 "变量变化",在实际中被用来使用已知的(例如,均匀的)随机数生成器生成任意的随机变量fg(X) = fY


It is tempting to think that in order to find the expected value E(g(X)), one must first find the probability density fg(X) of the new random variable Y = g(X). However, rather than computing

It is tempting to think that in order to find the expected value E(g(X)), one must first find the probability density fg(X) of the new random variable g(X)}}. However, rather than computing

人们很容易想到,为了找到期望值E(g(X)),必须首先找到新的随机变量Y = g(X) 的概率密度fg(X)。然而,不是计算

[math]\displaystyle{ \operatorname E\big(g(X)\big) = \int_{-\infty}^\infty y f_{g(X)}(y)\,dy, }[/math]


one may find instead

one may find instead

有人可能会发现与之相反的:


[math]\displaystyle{ \operatorname E\big(g(X)\big) = \int_{-\infty}^\infty g(x) f_X(x)\,dx. }[/math]

The values of the two integrals are the same in all cases in which both X and g(X) actually have probability density functions. It is not necessary that g be a one-to-one function. In some cases the latter integral is computed much more easily than the former. See Law of the unconscious statistician.

The values of the two integrals are the same in all cases in which both X and g(X) actually have probability density functions. It is not necessary that g be a one-to-one function. In some cases the latter integral is computed much more easily than the former. See Law of the unconscious statistician.

在 x 和 g (x)实际上都有概率密度函数的所有情况下,这两个积分的值是相同的。G 不一定是单射。在某些情况下,后一种积分比前一种积分更容易计算。见无意识统计学家定律。

Xg(X) 实际上在所有有概率密度函数的情况下,两个积分的值都是一样的。g 不一定是单射。在某些情况下,后者的积分比前者更容易计算。见无意识统计学家法则(Law of the unconscious statistician)词条。

Scalar to scalar 标量到标量

Let [math]\displaystyle{ g:{\mathbb R} \rightarrow {\mathbb R} }[/math] be a monotonic function, then the resulting density function is

Let [math]\displaystyle{ g:{\mathbb R} \rightarrow {\mathbb R} }[/math] be a monotonic function, then the resulting density function is

设{ mathbb r } </math > g: { mathbb r } right tarrow { mathbb r } </math > 是一个单调函数,那么得到的密度函数是

[math]\displaystyle{ g:{\mathbb R} \rightarrow {\mathbb R} }[/math]是一个单调的函数,那么得到的密度函数是

[math]\displaystyle{ f_Y(y) =f_X\big(g^{-1}(y)\big) \left| \frac{d}{dy} \big(g^{-1}(y)\big) \right|. }[/math]

Here g−1 denotes the inverse function.

Here g−1 denotes the inverse function.

这里 g−1 表示反函数。



This follows from the fact that the probability contained in a differential area must be invariant under change of variables. That is,

This follows from the fact that the probability contained in a differential area must be invariant under change of variables. That is,

这源于这样一个事实,即微分区域所包含的概率在变量变化下必须是不变的。就是说:

[math]\displaystyle{ \left| f_Y(y)\, dy \right| = \left| f_X(x)\, dx \right|, }[/math]


or

or


[math]\displaystyle{ f_Y(y) = \left| \frac{dx}{dy} \right| f_X(x) = \left| \frac{d}{dy} (x) \right| f_X(x) = \left| \frac{d}{dy} \big(g^{-1}(y)\big) \right| f_X\big(g^{-1}(y)\big) = {\big|\big(g^{-1}\big)'(y)\big|} \cdot f_X\big(g^{-1}(y)\big) . }[/math]


For functions that are not monotonic, the probability density function for y is

For functions that are not monotonic, the probability density function for y is

对于非单调的函数,y 的概率密度函数为

[math]\displaystyle{ \sum_{k=1}^{n(y)} \left| \frac{d}{dy} g^{-1}_{k}(y) \right| \cdot f_X\big(g^{-1}_{k}(y)\big), }[/math]

where n(y) is the number of solutions in x for the equation [math]\displaystyle{ g(x)=y }[/math], and [math]\displaystyle{ g_k^{-1}(y) }[/math] are these solutions.

where n(y) is the number of solutions in x for the equation [math]\displaystyle{ g(x)=y }[/math], and [math]\displaystyle{ g_k^{-1}(y) }[/math] are these solutions.

其中n(y) 是方程[math]\displaystyle{ g(x)=y }[/math]x 中的解的个数,而[math]\displaystyle{ g_k^{-1}(y) }[/math]是它们的解。

Vector to vector 向量到向量

The above formulas can be generalized to variables (which we will again call y) depending on more than one other variable. f(x1, ..., xn) shall denote the probability density function of the variables that y depends on, and the dependence shall be y = g(x1, …, xn). Then, the resulting density function is[citation needed]

The above formulas can be generalized to variables (which we will again call y) depending on more than one other variable. f(x1, ..., xn) shall denote the probability density function of the variables that y depends on, and the dependence shall be g(x1, …, xn)}}. Then, the resulting density function is

上述公式可以推广到取决于一个以上的其他变量的变量(我们将再次称之为y)。f(x1, ..., xn) 应表示y 所依赖的变量的概率密度函数,而依赖性应是y = g(x1, …, xn)。那么,得出的密度函数为

[math]\displaystyle{ \int\limits_{y = g(x_1, \ldots, x_n)} \frac{f(x_1,\ldots, x_n)}{\sqrt{\sum_{j=1}^n \frac{\partial g}{\partial x_j}(x_1, \ldots, x_n)^2}} \,dV, }[/math]

where the integral is over the entire (n − 1)-dimensional solution of the subscripted equation and the symbolic dV must be replaced by a parametrization of this solution for a particular calculation; the variables x1, ..., xn are then of course functions of this parametrization.

where the integral is over the entire (n − 1)-dimensional solution of the subscripted equation and the symbolic dV must be replaced by a parametrization of this solution for a particular calculation; the variables x1, ..., xn are then of course functions of this parametrization.

其中,积分是对下标方程的整个(n − 1)维解的积分,符号dV 必须由其解的参量化来代替,以进行特定的计算;然后,变量x1, ..., xn当然是这个参数化的函数。


This derives from the following, perhaps more intuitive representation: Suppose x is an n-dimensional random variable with joint density f. If y = H(x), where H is a bijective, differentiable function, then y has density g:

This derives from the following, perhaps more intuitive representation: Suppose x is an n-dimensional random variable with joint density f. If H(x)}}, where H is a bijective, differentiable function, then y has density g:

这源于以下,也许更直观的表示: 假设 x 是一个 n 维随机变量,联合密度 f。如果 h (x)}} ,其中 h 是一个双射,可微函数,那么 y 有密度 g:

这源于以下也许更直观的表述:假设x是一个具有联合密度fn 维随机变量,如果y = H(x),其中H 是一个双射的、可微的函数,那么y 具有密度g

[math]\displaystyle{ g(\mathbf{y}) = f\Big(H^{-1}(\mathbf{y})\Big)\left\vert \det\left[\frac{dH^{-1}(\mathbf{z})}{d\mathbf{z}}\Bigg \vert_{\mathbf{z}=\mathbf{y}}\right]\right \vert }[/math]


with the differential regarded as the Jacobian of the inverse of H(.), evaluated at y.[5]

with the differential regarded as the Jacobian of the inverse of H(.), evaluated at y.

微分被看作是H(.) 逆的雅可比矩阵,在y 处进行估计。[5]


For example, in the 2-dimensional case x = (x1x2), suppose the transform H is given as y1 = H1(x1x2), y2 = H2(x1x2) with inverses x1 = H1−1(y1y2), x2 = H2−1(y1y2). The joint distribution for y = (y1, y2) has density[6]

For example, in the 2-dimensional case x = (x1, x2), suppose the transform H is given as y1 = H1(x1, x2), y2 = H2(x1, x2) with inverses x1 = H1−1(y1, y2), x2 = H2−1(y1, y2). The joint distribution for y = (y1, y2) has density

例如,在二维的情况下,x = (x1x2),假设变换Hy1 = H1(x1x2), y2 = H2(x1x2) ,其逆x1 = H1−1(y1y2), x2 = H2−1(y1y2)。y = (y1, y2) 的联合分布具有密度[6]

[math]\displaystyle{ g(y_1,y_2) = f_{X_1,X_2}\big(H_1^{-1}(y_1,y_2), H_2^{-1}(y_1,y_2)\big) \left\vert \frac{\partial H_1^{-1}}{\partial y_1} \frac{\partial H_2^{-1}}{\partial y_2} - \frac{\partial H_1^{-1}}{\partial y_2} \frac{\partial H_2^{-1}}{\partial y_1} \right\vert. }[/math]


Vector to scalar 向量到标量

Let [math]\displaystyle{ V:{\mathbb R}^n \rightarrow {\mathbb R} }[/math] be a differentiable function and [math]\displaystyle{ X }[/math] be a random vector taking values in [math]\displaystyle{ {\mathbb R}^n }[/math], [math]\displaystyle{ f_X(\cdot) }[/math] be the probability density function of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ \delta(\cdot) }[/math] be the Dirac delta function. It is possible to use the formulas above to determine [math]\displaystyle{ f_Y(\cdot) }[/math], the probability density function of [math]\displaystyle{ Y=V(X) }[/math], which will be given by

Let [math]\displaystyle{ V:{\mathbb R}^n \rightarrow {\mathbb R} }[/math] be a differentiable function and [math]\displaystyle{ X }[/math] be a random vector taking values in [math]\displaystyle{ {\mathbb R}^n }[/math], [math]\displaystyle{ f_X(\cdot) }[/math] be the probability density function of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ \delta(\cdot) }[/math] be the Dirac delta function. It is possible to use the formulas above to determine [math]\displaystyle{ f_Y(\cdot) }[/math], the probability density function of [math]\displaystyle{ Y=V(X) }[/math], which will be given by

[math]\displaystyle{ V:{\mathbb R}^n \rightarrow {\mathbb R} }[/math] 是一个可微函数,[math]\displaystyle{ X }[/math] 是一个在[math]\displaystyle{ {\mathbb R}^n }[/math]中取值的随机向量,[math]\displaystyle{ f_X(\cdot) }[/math][math]\displaystyle{ X }[/math]的概率密度函数,[math]\displaystyle{ \delta(\cdot) }[/math] 是狄拉克δ函数(Dirac delta Function)。可以使用上述公式来确定[math]\displaystyle{ f_Y(\cdot) }[/math],即[math]\displaystyle{ Y=V(X) }[/math]的概率密度函数,它将由以下公式给出:


[math]\displaystyle{ f_Y(y) = \int_{{\mathbb R}^n} f_{X}(\mathbf{x}) \delta\big(y - V(\mathbf{x})\big) \,d \mathbf{x}. }[/math]


This result leads to the Law of the unconscious statistician:

This result leads to the Law of the unconscious statistician:

这个结果引出了无意识统计学家法则:


[math]\displaystyle{ \operatorname{E}_Y[Y]=\int_{{\mathbb R}} y f_Y(y) dy = \int_{{\mathbb R}} y \int_{{\mathbb R}^n} f_{X}(\mathbf{x}) \delta\big(y - V(\mathbf{x})\big) \,d \mathbf{x} dy = \int_{{\mathbb R}^n} \int_{{\mathbb R}} y f_{X}(\mathbf{x}) \delta\big(y - V(\mathbf{x})\big) \, dy d \mathbf{x}= \int_{{\mathbb R}^n} V(\mathbf{x}) f_{X}(\mathbf{x}) d \mathbf{x}=\operatorname{E}_X[V(X)]. }[/math]


Proof:

Proof:

证明:


Let [math]\displaystyle{ Z }[/math] be a collapsed random variable with probability density function [math]\displaystyle{ p_Z(z)=\delta(z) }[/math] (i.e. a constant equal to zero). Let the random vector [math]\displaystyle{ \tilde{X} }[/math] and the transform [math]\displaystyle{ H }[/math] be defined as

Let [math]\displaystyle{ Z }[/math] be a collapsed random variable with probability density function [math]\displaystyle{ p_Z(z)=\delta(z) }[/math] (i.e. a constant equal to zero). Let the random vector [math]\displaystyle{ \tilde{X} }[/math] and the transform [math]\displaystyle{ H }[/math] be defined as

[math]\displaystyle{ Z }[/math]是一个坍缩的随机变量(collapsed random variable),其概率密度函数[math]\displaystyle{ p_Z(z)=delta(z) }[/math](即一个等于0的常数)。设随机向量[math]\displaystyle{ \tilde{X} }[/math]和变换[math]\displaystyle{ H }[/math]定义为:


[math]\displaystyle{ H(Z,X)=\begin{bmatrix} Z+V(X)\\ X\end{bmatrix}=\begin{bmatrix} Y\\ \tilde{X}\end{bmatrix} }[/math].



It is clear that [math]\displaystyle{ H }[/math] is a bijective mapping, and the Jacobian of [math]\displaystyle{ H^{-1} }[/math] is given by:

It is clear that [math]\displaystyle{ H }[/math] is a bijective mapping, and the Jacobian of [math]\displaystyle{ H^{-1} }[/math] is given by:

很明显,[math]\displaystyle{ H }[/math] 是一个双射映射, [math]\displaystyle{ H^{-1} }[/math] 的雅可比矩阵由以下方式给出:

[math]\displaystyle{ \frac{dH^{-1}(y,\tilde{\mathbf{x}})}{dy\,d\tilde{\mathbf{x}}}=\begin{bmatrix} 1 & -\frac{dV(\tilde{\mathbf{x}})}{d\tilde{\mathbf{x}}}\\ \mathbf{0}_{n\times1} & \mathbf{I}_{n\times n} \end{bmatrix} }[/math],


which is an upper triangular matrix with ones on the main diagonal, therefore its determinant is 1. Applying the change of variable theorem from the previous section we obtain that

which is an upper triangular matrix with ones on the main diagonal, therefore its determinant is 1. Applying the change of variable theorem from the previous section we obtain that

它是一个上三角矩阵,主对角线上有1,因此它的行列式是1。 应用上一节的变量变化定理,我们可以得到

[math]\displaystyle{ f_{Y,X}(y,x) = f_{X}(\mathbf{x}) \delta\big(y - V(\mathbf{x})\big) }[/math],


which if marginalized over [math]\displaystyle{ x }[/math] leads to the desired probability density function.

which if marginalized over [math]\displaystyle{ x }[/math] leads to the desired probability density function.

如果对[math]\displaystyle{ x }[/math]进行边际化,就会得到我们所需要的概率密度函数。


Sums of independent random variables 独立随机变量之和


The probability density function of the sum of two independent random variables U and V, each of which has a probability density function, is the convolution of their separate density functions:

The probability density function of the sum of two independent random variables U and V, each of which has a probability density function, is the convolution of their separate density functions:

两个独立随机变量U和V的概率密度函数之和,其中每个都有一个概率密度函数,是它们各自密度函数的卷积:

[math]\displaystyle{ f_{U+V}(x) = \int_{-\infty}^\infty f_U(y) f_V(x - y)\,dy = \left( f_{U} * f_{V} \right) (x) }[/math]

It is possible to generalize the previous relation to a sum of N independent random variables, with densities U1, ..., UN:

It is possible to generalize the previous relation to a sum of N independent random variables, with densities U1, ..., UN:

可以推广到 n 个独立随机变量之和,其密度 u < sub > 1 ,... ,u < sub > n : 可以把前面的关系推广为N个独立随机变量的总和,其密度为U1, ..., UN:

[math]\displaystyle{ f_{U_1 + \cdots + U_N}(x) = \left( f_{U_1} * \cdots * f_{U_N} \right) (x) }[/math]


This can be derived from a two-way change of variables involving Y=U+V and Z=V, similarly to the example below for the quotient of independent random variables.

This can be derived from a two-way change of variables involving Y=U+V and Z=V, similarly to the example below for the quotient of independent random variables.

这可以从关于 Y=U+VZ=V的双向变量变化中推导得出,与下面独立随机变量商的例子类似。


Products and quotients of independent random variables


Given two independent random variables U and V, each of which has a probability density function, the density of the product Y = UV and quotient Y=U/V can be computed by a change of variables.

Given two independent random variables U and V, each of which has a probability density function, the density of the product Y = UV and quotient Y=U/V can be computed by a change of variables.

给定两个独立的随机变量UV,每个变量都有一个概率密度函数,可以通过变量的变化来计算积Y = UV 和商 Y=U/V 的密度。


Example: Quotient distribution 示例:商的分布

To compute the quotient Y = U/V of two independent random variables U and V, define the following transformation:

To compute the quotient Y = U/V of two independent random variables U and V, define the following transformation:

为了计算两个独立随机变量 UV的商 Y = U/V,定义以下变换:


[math]\displaystyle{ Y=U/V }[/math]

[math]\displaystyle{ Z=V }[/math]



Then, the joint density p(y,z) can be computed by a change of variables from U,V to Y,Z, and Y can be derived by marginalizing out Z from the joint density.

Then, the joint density p(y,z) can be computed by a change of variables from U,V to Y,Z, and Y can be derived by marginalizing out Z from the joint density.

然后,联合密度 p(y,z)可以通过改变变量从U,VY,Z来计算,并且 Y可以通过从联合密度中,边际化掉Z来得到。


The inverse transformation is

The inverse transformation is

逆变换是


[math]\displaystyle{ U = YZ }[/math]


[math]\displaystyle{ V = Z }[/math]



The Jacobian matrix [math]\displaystyle{ J(U,V\mid Y,Z) }[/math] of this transformation is

The Jacobian matrix [math]\displaystyle{ J(U,V\mid Y,Z) }[/math] of this transformation is

该变换的雅可比矩阵 [math]\displaystyle{ J(U,V\mid Y,Z) }[/math]

[math]\displaystyle{ \left| \det\begin{bmatrix} \frac{\partial u}{\partial y} & \frac{\partial u}{\partial z} \\ \frac{\partial v}{\partial y} & \frac{\partial v}{\partial z} \end{bmatrix} \right| = \left| \det\begin{bmatrix} z & y \\ 0 & 1 \end{bmatrix} \right| = |z| . }[/math]


Thus:

Thus:

因此:


[math]\displaystyle{ p(y,z) = p(u,v)\,J(u,v\mid y,z) = p(u)\,p(v)\,J(u,v\mid y,z) = p_U(yz)\,p_V(z)\, |z| . }[/math]


And the distribution of Y can be computed by marginalizing out Z:

And the distribution of Y can be computed by marginalizing out Z:

Y的分布可以通过边际化Z 来计算:


[math]\displaystyle{ p(y) = \int_{-\infty}^\infty p_U(yz)\,p_V(z)\, |z| \, dz }[/math]


This method crucially requires that the transformation from U,V to Y,Z be bijective. The above transformation meets this because Z can be mapped directly back to V, and for a given V the quotient U/V is monotonic. This is similarly the case for the sum U + V, difference U − V and product UV.

This method crucially requires that the transformation from U,V to Y,Z be bijective. The above transformation meets this because Z can be mapped directly back to V, and for a given V the quotient U/V is monotonic. This is similarly the case for the sum U + V, difference U − V and product UV.

这种方法的关键是要求UVYZ 的变换是双射的。上述变换符合这一点,因为 Z可以直接被映射回V,而且对于一个给定的V,商U/V'是单调的 对于和U + V、U − V 和积 UV 也是如此。


Exactly the same method can be used to compute the distribution of other functions of multiple independent random variables.

Exactly the same method can be used to compute the distribution of other functions of multiple independent random variables.

同样的方法可以用来计算多个独立随机变量的其他函数的分布。


Example: Quotient of two standard normals 示例:两个标准正态(变量)的商

Given two standard normal variables U and V, the quotient can be computed as follows. First, the variables have the following density functions:

Given two standard normal variables U and V, the quotient can be computed as follows. First, the variables have the following density functions:

给定两个标准正态变量 UV,商可以按以下方式计算。首先,这两个变量具有以下密度函数:


[math]\displaystyle{ p(u) = \frac{1}{\sqrt{2\pi}} e^{-\frac{u^2}{2}} }[/math]


[math]\displaystyle{ p(v) = \frac{1}{\sqrt{2\pi}} e^{-\frac{v^2}{2}} }[/math]


We transform as described above:

We transform as described above:

我们如上所述进行变换:



[math]\displaystyle{ Y=U/V }[/math]



[math]\displaystyle{ Z=V }[/math]



This leads to:

This leads to:

那么:


[math]\displaystyle{ \begin{align} p(y) &= \int_{-\infty}^\infty p_U(yz)\,p_V(z)\, |z| \, dz \\[5pt] &= \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} y^2 z^2} \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} z^2} |z| \, dz \\[5pt] &= \int_{-\infty}^\infty \frac{1}{2\pi} e^{-\frac{1}{2}\left(y^2+1\right)z^2} |z| \, dz \\[5pt] &= 2\int_0^\infty \frac{1}{2\pi} e^{-\frac{1}{2}\left(y^2+1\right)z^2} z \, dz \\[5pt] &= \int_0^\infty \frac{1}{\pi} e^{-\left(y^2+1\right)u} \, du && u=\tfrac{1}{2}z^2\\[5pt] &= \left. -\frac{1}{\pi \left(y^2+1\right)} e^{-\left(y^2+1\right)u}\right|_{u=0}^\infty \\[5pt] &= \frac{1}{\pi \left(y^2+1\right)} \end{align} }[/math]


This is the density of a standard Cauchy distribution.

This is the density of a standard Cauchy distribution.

这是标准柯西分布的密度。


See also 参见


References 参考

  1. 1.0 1.1 Grinstead, Charles M.; Snell, J. Laurie (2009). "Conditional Probability - Discrete Conditional". Grinstead & Snell's Introduction to Probability. Orange Grove Texts. ISBN 161610046X. https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter4.pdf. Retrieved 2019-07-25. 
  2. 2.0 2.1 Probability distribution function PlanetMath -{zh-cn:互联网档案馆; zh-tw:網際網路檔案館; zh-hk:互聯網檔案館;}-存檔,存档日期2011-08-07.
  3. 3.0 3.1 Probability Function at MathWorld
  4. 4.0 4.1 Ord, J.K. (1972) Families of Frequency Distributions, Griffin. (for example, Table 5.1 and Example 5.4)
  5. 5.0 5.1 Devore, Jay L.; Berk, Kenneth N. (2007). Modern Mathematical Statistics with Applications. Cengage. p. 263. ISBN 0-534-40473-1. https://books.google.com/books?id=3X7Qca6CcfkC&pg=PA263. 
  6. 6.0 6.1 David, Stirzaker (2007-01-01). Elementary Probability. Cambridge University Press. ISBN 0521534283. OCLC 851313783. 


Bibliography 引用

  • Pierre Simon de Laplace

作者: 皮埃尔 · 西蒙 · 拉普拉斯 (1812

1812年). Analytical Theory of Probability. 

| title = Analytical Theory of Probability}}

| title = 分析概率理论}

The first major treatise blending calculus with probability theory, originally in French: Théorie Analytique des Probabilités.
The first major treatise blending calculus with probability theory, originally in French: Théorie Analytique des Probabilités.

第一部混合微积分和概率论的主要论文,最初用法语写成: 《概率分析理论》。

  • Andrei Nikolajevich Kolmogorov (1950

1950年). Foundations of the Theory of Probability. https://archive.org/details/foundationsofthe00kolm. 

| title = Foundations of the Theory of Probability| url = https://archive.org/details/foundationsofthe00kolm}}

概率论的基础 | url = https://archive.org/details/foundationsofthe00kolm }

The modern measure-theoretic foundation of probability theory; the original German version (Grundbegriffe der Wahrscheinlichkeitsrechnung) appeared in 1933.
The modern measure-theoretic foundation of probability theory; the original German version (Grundbegriffe der Wahrscheinlichkeitsrechnung) appeared in 1933.

概率论的现代测量理论基础,最初的德国版本(grundbigriffe der wahrscheinlicheitsrechung)出现在1933年。

  • Patrick Billingsley

作者: Patrick Billingsley (1979

1979年). 概率与测量. New York, Toronto, London: John Wiley and Sons

约翰 · 威利父子出版社. ISBN 0-471-00710-2. 

| isbn = 0-471-00710-2}}

| isbn = 0-471-00710-2}}

  • David Stirzaker

作者: David Stirzaker (2003

2003年). [https://archive.org/details/elementaryprobab0000stir

Https://archive.org/details/elementaryprobab0000stir 基本概率]. ISBN 0-521-42028-8. https://archive.org/details/elementaryprobab0000stir

Https://archive.org/details/elementaryprobab0000stir. 

}}
}}
Chapters 7 to 9 are about continuous variables.
Chapters 7 to 9 are about continuous variables.

第七章到第九章是关于连续变量的。

External links 外部链接

}}

}}

  • Weisstein, Eric W. "概率密度函数". MathWorld.


模板:Theory of probability distributions

Category:Functions related to probability distributions

类别: 与概率分布有关的函数

Category:Concepts in physics

分类: 物理概念


This page was moved from wikipedia:en:Probability density function. Its edit history can be viewed at 概率密度函数/edithistory