分布函数

来自集智百科
跳到导航 跳到搜索

此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。

本词条由信白初步翻译

模板:Refimprove


文件:Exponential distribution cdf.png
Cumulative distribution function for the exponential distribution

Cumulative distribution function for the exponential distribution

(指数分布)的累积分布函数 Cumulative Distribution Function

文件:Normal Distribution CDF.svg
Cumulative distribution function for the normal distribution

Cumulative distribution function for the normal distribution

(正态分布)的累积分布函数


In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable [math]\displaystyle{ X }[/math], or just distribution function of [math]\displaystyle{ X }[/math], evaluated at [math]\displaystyle{ x }[/math], is the probability that [math]\displaystyle{ X }[/math] will take a value less than or equal to [math]\displaystyle{ x }[/math].[1]

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.

在概率论和统计学中,一个实值随机变量X的累积分布函数(CDF),或称[math]\displaystyle{ X }[/math]的分布函数,在[math]\displaystyle{ x }[/math]处,是[math]\displaystyle{ X }[/math]取值小于或等于[math]\displaystyle{ x }[/math]的概率。


In the case of a scalar continuous distribution, it gives the area under the probability density function from minus infinity to [math]\displaystyle{ x }[/math]. Cumulative distribution functions are also used to specify the distribution of multivariate random variables.

In the case of a scalar continuous distribution, it gives the area under the probability density function from minus infinity to x. Cumulative distribution functions are also used to specify the distribution of multivariate random variables.

在标量连续分布的情况下,它给出了从负无穷到[math]\displaystyle{ x }[/math]的概率密度函数下的面积,累积分布函数也被用来指定多变量随机变量的分布。


Definition

定义

The cumulative distribution function of a real-valued random variable [math]\displaystyle{ X }[/math] is the function given by[2]:p. 77

The cumulative distribution function of a real-valued random variable X is the function given by

一个实值随机变量[math]\displaystyle{ X }[/math]的累积分布函数是由以下函数给出的。


{{Equation box 1

{{Equation box 1

{方程式方框1

|indent =

|indent =

2012年10月22日

|title=

|title=

2012年10月11日

|equation =

[math]\displaystyle{ F_X(x) = \operatorname{P}(X\leq x) }[/math]

 

 

 

 

(Eq.1)

|equation = }}

| equation = }

|cellpadding= 6

|cellpadding= 6

6

|border

|border

边界

|border colour = #0073CF

|border colour = #0073CF

0073CF

|background colour=#F5FFFA}}

|background colour=#F5FFFA}}

5/fffa }}


where the right-hand side represents the probability that the random variable [math]\displaystyle{ X }[/math] takes on a value less than or

where the right-hand side represents the probability that the random variable X takes on a value less than or

其中,右侧代表随机变量[math]\displaystyle{ X }[/math]取值小于或

equal to [math]\displaystyle{ x }[/math]. The probability that [math]\displaystyle{ X }[/math] lies in the semi-closed interval [math]\displaystyle{ (a,b] }[/math], where [math]\displaystyle{ a \lt b }[/math], is therefore[2]:p. 84

equal to x. The probability that X lies in the semi-closed interval (a,b], where a < b, is therefore using the Fundamental Theorem of Calculus; i.e. given F(x),

等于[math]\displaystyle{ x }[/math]的概率。因此,利用微积分基本定理;即给定F(x),X位于半封闭区间(a,b)的概率,其中a<b。


{{Equation box 1

f(x) = {dF(x) \over dx} 

f (x) = { dF (x)/dx }

|indent =

|title=

as long as the derivative exists.

只要衍生物存在。

|equation =

[math]\displaystyle{ \operatorname{P}(a \lt X \le b)= F_X(b)-F_X(a) }[/math]

 

 

 

 

(Eq.2)

|cellpadding= 6

The CDF of a continuous random variable X can be expressed as the integral of its probability density function f_X as follows:, is the value of cumulative distribution function of the normal distribution. It is very useful to use Z-table not only for probabilities below a value which is the original application of cumulative distribution function, but also above and/or between values on standard normal distribution, and it was further extended to any normal distribution.

连续随机变量[math]\displaystyle{ X }[/math]的CDF可以用其概率密度函数f_X的积分来表示,如下:是正态分布的累积分布函数值。使用Z表是非常有用的,它不仅适用于低于某值的概率,这是累积分布函数的最初应用,而且还适用于标准正态分布上高于和/或介于该值之间的概率,它被进一步扩展到任何正态分布。

|border

|border colour = #0073CF

Properties

属性

|background colour=#F5FFFA}}


\bar F_X(x) \leq \frac{\operatorname{E}(X)}{x} .

除了 f _ x (x) leq frac { operatorname { e }(x)}{ x }。

In the definition above, the "less than or equal to" sign, "≤", is a convention, not a universally used one (e.g. Hungarian literature uses "<"), but the distinction is important for discrete distributions. The proper use of tables of the binomial and Poisson distributions depends upon this convention. Moreover, important formulas like Paul Lévy's inversion formula for the characteristic function also rely on the "less than or equal" formulation.


Proof: Assuming X has a density function f_X, for any c> 0

证明:假设X有一个密度函数f_X,对于任何c>0的情况下

If treating several random variables [math]\displaystyle{ X,Y,\ldots }[/math] etc. the corresponding letters are used as subscripts while, if treating only one, the subscript is usually omitted. It is conventional to use a capital [math]\displaystyle{ F }[/math] for a cumulative distribution function, in contrast to the lower-case [math]\displaystyle{ f }[/math] used for probability density functions and probability mass functions. This applies when discussing general distributions: some specific distributions have their own conventional notation, for example the normal distribution.

[math]\displaystyle{ \operatorname{E}(X) = \int_0^\infty x f_X(x) \, dx \geq \int_0^c x f_X(x) \, dx + c\int_c^\infty f_X(x) \, dx 操作数{ e }(x) = int _ 0 ^ infty x _ x (x) ,dx geq int _ 0 ^ c x _ x (x) ,dx + c int _ c ^ infty f _ x (x) ,dx The probability density function of a continuous random variable can be determined from the cumulative distribution function by differentiating\lt ref\gt {{Cite book|title=Applied Statistics and Probability for Engineers|last1=Montgomery|first1=Douglas C.|last2=Runger|first2=George C.|publisher=John Wiley & Sons, Inc.|year=2003|isbn=0-471-20454-4|page=104|url=http://www.um.edu.ar/math/montgomery.pdf}}\lt /ref\gt using the [[Fundamental Theorem of Calculus]]; i.e. given \lt math\gt F(x) }[/math],

</math>



Then, on recognizing \bar F_X(c) = \int_c^\infty f_X(x) \, dx and rearranging terms,

然后,在识别条形 f _ x (c) = int _ c ^ infty f _ x (x) ,dx 并重新排列术语时,

[math]\displaystyle{ f(x) = {dF(x) \over dx} }[/math]

[math]\displaystyle{ 0 \leq c\bar F_X(c) \leq \operatorname{E}(X) - \int_0^c x f_X(x) \, dx \to 0 \text{ as } c \to \infty 0 leq c bar f _ x (c) leq operatorname { e }(x)-int _ 0 ^ c x _ x (x) ,dx to 0 text { as } c to infty as long as the derivative exists. }[/math]



as claimed.


The CDF of a continuous random variable [math]\displaystyle{ X }[/math] can be expressed as the integral of its probability density function [math]\displaystyle{ f_X }[/math] as follows:[2]:p. 86


[math]\displaystyle{ F_X(x) = \int_{-\infty}^x f_X(t)\,dt. }[/math]

Example of the folded cumulative distribution for a normal distribution function with an expected value of 0 and a standard deviation of 1.

期望值为0、标准差为1的正态分布函数的折叠累积分布示例


While the plot of a cumulative distribution often has an S-like shape, an alternative illustration is the folded cumulative distribution or mountain plot, which folds the top half of the graph over,

虽然累积分布的图通常具有S形,但另一种说明是折叠累积分布或山地图,它将图的上半部分折叠过来。

In the case of a random variable [math]\displaystyle{ X }[/math] which has distribution having a discrete component at a value [math]\displaystyle{ b }[/math],

thus using two scales, one for the upslope and another for the downslope. This form of illustration emphasises the median and dispersion (specifically, the mean absolute deviation from the median) of the distribution or of the empirical results.

在随机变量[math]\displaystyle{ X }[/math]的情况下,它的分布在值[math]\displaystyle{ b }[/math]处有一个离散分量。

因此,使用两个尺度,一个用于上坡,另一个用于下坡。这种说明形式强调了分布或经验结果的中位数和离散度(特别是中位数的平均绝对偏差)。


[math]\displaystyle{ \operatorname{P}(X=b) = F_X(b) - \lim_{x \to b^{-}} F_X(x). }[/math]


If [math]\displaystyle{ F_X }[/math] is continuous at [math]\displaystyle{ b }[/math], this equals zero and there is no discrete component at [math]\displaystyle{ b }[/math].

If the CDF F is strictly increasing and continuous then F^{-1}( p ), p \in [0,1], is the unique real number x such that F(x) = p . In such a case, this defines the inverse distribution function or quantile function.

如果CDF F是严格递增且连续的,那么F^{-1}( p ),p \in [0,1],是唯一的实数x,使得F(x) = p 。在这种情况下,这就定义了逆分布函数或分位函数。


Properties

属性

Some distributions do not have a unique inverse (for example in the case where f_X(x)=0 for all a<x<b, causing F_X to be constant). This problem can be solved by defining, for p \in [0,1] , the generalized inverse distribution function:

有些分布没有唯一的逆分布(例如在所有a<x<b的情况下f_X(x)=0,导致f_X为常数)。这个问题可以通过定义,对于p \in [0,1] ,广义逆分布函数来解决。

文件:Discrete probability distribution illustration.svg
From top to bottom, the cumulative distribution function of a discrete probability distribution, continuous probability distribution, and a distribution which has both a continuous part and a discrete part.

[math]\displaystyle{ F^{-1}(p) = \inf \{x \in \mathbb{R}: F(x) \geq p \}. F ^ {-1}(p) = inf { x in mathbb { r } : f (x) geq p }. Every cumulative distribution function \lt math\gt F_X }[/math] is non-decreasing[2]:p. 78 and right-continuous[2]:p. 79, which makes it a càdlàg function. Furthermore,

</math>


[math]\displaystyle{ \lim_{x\to -\infty}F_X(x)=0, \quad \lim_{x\to +\infty}F_X(x)=1. }[/math]


Every function with these four properties is a CDF, i.e., for every such function, a random variable can be defined such that the function is the cumulative distribution function of that random variable.


Some useful properties of the inverse cdf (which are also preserved in the definition of the generalized inverse distribution function) are:

反cdf的一些有用的特性(在广义逆分布函数的定义中也保留了这些特性)是:

If [math]\displaystyle{ X }[/math] is a purely discrete random variable, then it attains values [math]\displaystyle{ x_1,x_2,\ldots }[/math] with probability [math]\displaystyle{ p_i = p(x_i) }[/math], and the CDF of [math]\displaystyle{ X }[/math] will be discontinuous at the points [math]\displaystyle{ x_i }[/math]:


F^{-1} is nondecreasing

F ^ {-1}是不减的

[math]\displaystyle{ F_X(x) = \operatorname{P}(X\leq x) = \sum_{x_i \leq x} \operatorname{P}(X = x_i) = \sum_{x_i \leq x} p(x_i). }[/math]
F^{-1}(F(x)) \leq x

F ^ {-1}(f (x)) leq x


F(F^{-1}(p)) \geq p

F (f ^ {-1}(p)) geq p

If the CDF [math]\displaystyle{ F_X }[/math] of a real valued random variable [math]\displaystyle{ X }[/math] is continuous, then [math]\displaystyle{ X }[/math] is a continuous random variable; if furthermore [math]\displaystyle{ F_X }[/math] is absolutely continuous, then there exists a Lebesgue-integrable function [math]\displaystyle{ f_X(x) }[/math] such that

F^{-1}(p) \leq x if and only if p \leq F(x)

F ^ {-1}(p) leq x 当且仅当 p leq f (x)


If Y has a U[0, 1] distribution then F^{-1}(Y) is distributed as F. This is used in random number generation using the inverse transform sampling-method.
如果Y具有U[0,1]分布,那么F^{-1}(Y)的分布为F,这在随机数生成中使用了逆向变换采样法。
[math]\displaystyle{ F_X(b)-F_X(a) = \operatorname{P}(a\lt X\leq b) = \int_a^b f_X(x)\,dx }[/math]
If \{X_\alpha\} is a collection of independent F-distributed random variables defined on the same sample space, then there exist random variables Y_\alpha such that Y_\alpha is distributed as U[0,1] and F^{-1}(Y_\alpha) = X_\alpha with probability 1 for all \alpha.
如果{X_\alpha}是定义在同一样本空间上的独立F分布随机变量的集合,那么存在随机变量Y_\alpha,使得Y_\alpha分布为U[0,1],并且F^{-1}(Y_\alpha)=X_\alpha,对所有的alpha来说概率为1。


for all real numbers [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math]. The function [math]\displaystyle{ f_X }[/math] is equal to the derivative of [math]\displaystyle{ F_X }[/math] almost everywhere, and it is called the probability density function of the distribution of [math]\displaystyle{ X }[/math].

The inverse of the cdf can be used to translate results obtained for the uniform distribution to other distributions.

cdf的倒数可以用来将均匀分布的结果转化为其他分布。


Examples

例子

As an example, suppose [math]\displaystyle{ X }[/math] is uniformly distributed on the unit interval [math]\displaystyle{ [0,1] }[/math].

The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function.

经验分布函数是对产生样本中各点的累积分布函数的估计。它以概率1收敛于该基本分布。有一些结果可以量化经验分布函数与基本累积分布函数的收敛速度。


Then the CDF of [math]\displaystyle{ X }[/math] is given by


[math]\displaystyle{ F_X(x) = \begin{cases} When dealing simultaneously with more than one random variable the joint cumulative distribution function can also be defined. For example, for a pair of random variables X,Y, the joint CDF F_{XY} is given by: 当同时处理多个随机变量时,还可以定义联合累积分布函数。例如,对于一对随机变量X,Y,联合累积分布函数F_{XY}由以下公式给出: 0 &:\ x \lt 0\\ given the joint probability density function in tabular form, determine the joint cumulative distribution function. 给定表格形式的联合概率密度函数,确定联合累积分布函数。 {| class="wikitable" { | class = “ wikitable” x &:\ 0 \le x \le 1\\ | | |Y = 2 2 1 &:\ x \gt 1 |Y = 4 4 |Y = 6 6 \end{cases} }[/math]

|Y = 8

8


|-

|-

Suppose instead that [math]\displaystyle{ X }[/math] takes only the discrete values 0 and 1, with equal probability.

|X = 1

1


|0

|0

Then the CDF of [math]\displaystyle{ X }[/math] is given by

|0.1

|0.1


|0

|0

[math]\displaystyle{ F_X(x) = \begin{cases} |0.1 |0.1 |- |- 0 &:\ x \lt 0\\ |X = 3 | x = 3 |0 |0 1/2 &:\ 0 \le x \lt 1\\ |0 |0 |0.2 |0.2 1 &:\ x \ge 1 |0 |0 |- |- \end{cases} }[/math]

|X = 5

5


|0.3

|0.3

Suppose [math]\displaystyle{ X }[/math] is exponential distributed. Then the CDF of [math]\displaystyle{ X }[/math] is given by

|0

|0


|0

|0

[math]\displaystyle{ F_X(x;\lambda) = \begin{cases} |0.15 |0.15 |- |- 1-e^{-\lambda x} & x \ge 0, \\ |X = 7 7 |0 |0 0 & x \lt 0. |0 |0 |0.15 |0.15 \end{cases} }[/math]

|0

|0


|}

|}

Here λ > 0 is the parameter of the distribution, often called the rate parameter.

Solution: using the given table of probabilities for each potential range of X and Y, the joint cumulative distribution function may be constructed in tabular form:

解:利用给定的X和Y的每个潜在范围的概率表,可以用表格形式构造联合累积分布函数。


{ | class = “ wikitable” Suppose [math]\displaystyle{ X }[/math] is normal distributed. Then the CDF of [math]\displaystyle{ X }[/math] is given by This has applications in statisticalhypothesis testing, for example, because the one-sided p-value is the probability of observing a test statistic at least as extreme as the one observed. Thus, provided that the test statistic, T, has a continuous distribution, the one-sided p-value is simply given by the ccdf: for an observed value [math]\displaystyle{ t }[/math] of the test statistic


Y < 2 y < 2
[math]\displaystyle{ |2 ≤ Y \lt 4 |2 ≤ Y \lt 4 |4 ≤ Y \lt 6 |4 ≤ Y \lt 6 F(x;\mu,\sigma) |6 ≤ Y \lt 8 |6 ≤ Y \lt 8 |Y ≤ 8 |Y ≤ 8 = |- |- |X \lt 1 | x \lt 1 \frac{1}{\sigma\sqrt{2\pi}} |0 |0 |0 |0 \int_{-\infty}^x |0 |0 |0 |0 \exp |0 |0 |- |- \left( -\frac{(t - \mu)^2}{2\sigma^2} |1 ≤ X \lt 3 |1 ≤ X \lt 3 |0 |0 \ \right)\, dt. |0 |0 |0.1 |0.1 }[/math]
0.1 0.1


0.2 0.2

Here the parameter [math]\displaystyle{ \mu }[/math]  is the mean or expectation of the distribution; and [math]\displaystyle{ \sigma }[/math]  is its standard deviation.

3 ≤ X < 5 3 ≤ X < 5

Suppose [math]\displaystyle{ X }[/math] is binomial distributed. Then the CDF of [math]\displaystyle{ X }[/math] is given by

0 0


0 0
[math]\displaystyle{ F(k;n,p)=\Pr(X\leq k)=\sum _{i=0}^{\lfloor k\rfloor }{n \choose i}p^{i}(1-p)^{n-i} }[/math]
0.1 0.1


0.3 0.3

Here [math]\displaystyle{ p }[/math] is the probability of success and the function denotes the discrete probability distribution of the number of successes in a sequence of [math]\displaystyle{ n }[/math] independent experiments, and [math]\displaystyle{ \lfloor k\rfloor\, }[/math] is the "floor" under [math]\displaystyle{ k }[/math], i.e. the greatest integer less than or equal to [math]\displaystyle{ k }[/math].

0.4 0.4


5 ≤ X < 7 5 ≤ X < 7

Derived functions

0 0

Complementary cumulative distribution function (tail distribution)

0.3 0.3

Sometimes, it is useful to study the opposite question and ask how often the random variable is above a particular level. This is called the complementary cumulative distribution function (ccdf) or simply the tail distribution or exceedance, and is defined as

0.4 0.4


0.6 0.6
[math]\displaystyle{ \bar F_X(x) = \operatorname{P}(X \gt x) = 1 - F_X(x). }[/math]
0.85 0.85


X ≤ 7 X ≤ 7
[math]\displaystyle{ p= \operatorname{P}(T \ge t) = \operatorname{P}(T \gt t) =1 - F_T(t). }[/math]
0 0


0.3 0.3

In survival analysis, [math]\displaystyle{ \bar F_X(x) }[/math] is called the survival function and denoted [math]\displaystyle{ S(x) }[/math], while the term reliability function is common in engineering.

0.4 0.4


0.75 0.75

Z-table:

1 1


|}

One of the most popular application of cumulative distribution function is standard normal table, also called the unit normal table or Z table[3], is the value of cumulative distribution function of the normal distribution. It is very useful to use Z-table not only for probabilities below a value which is the original application of cumulative distribution function, but also above and/or between values on standard normal distribution, and it was further extended to any normal distribution.


< br/>


Properties

For N random variables X_1,\ldots,X_N, the joint CDF F_{X_1,\ldots,X_N} is given by

对于N个随机变量X_1,\ldots,X_N,联合CDF F_{X_1,\ldots,X_N}由以下公式给出。

[math]\displaystyle{ \bar F_X(x) \leq \frac{\operatorname{E}(X)}{x} . }[/math]
  • As [math]\displaystyle{ x \to \infty, \bar F_X(x) \to 0 \ }[/math], and in fact [math]\displaystyle{ \bar F_X(x) = o(1/x) }[/math] provided that [math]\displaystyle{ \operatorname{E}(X) }[/math] is finite.

{{Equation box 1

{方程式方框1

Proof:[citation needed] Assuming [math]\displaystyle{ X }[/math] has a density function [math]\displaystyle{ f_X }[/math], for any [math]\displaystyle{ c\gt 0 }[/math]

|indent =

2012年10月22日

[math]\displaystyle{ |title= 2012年10月11日 \operatorname{E}(X) = \int_0^\infty x f_X(x) \, dx \geq \int_0^c x f_X(x) \, dx + c\int_c^\infty f_X(x) \, dx |equation = }} | equation = } }[/math]

|cellpadding= 6

6

Then, on recognizing [math]\displaystyle{ \bar F_X(c) = \int_c^\infty f_X(x) \, dx }[/math] and rearranging terms,

|border

边界

[math]\displaystyle{ |border colour = #0073CF 0073CF 0 \leq c\bar F_X(c) \leq \operatorname{E}(X) - \int_0^c x f_X(x) \, dx \to 0 \text{ as } c \to \infty |background colour=#F5FFFA}} 5/fffa }} }[/math]
as claimed.

Interpreting the N random variables as a random vector \mathbf{X} = (X_1,\ldots,X_N)^T yields a shorter notation:

将N个随机变量解释为一个随机向量 \mathbf{X} = (X_1,\ldots,X_N)^T,可以得到一个更短的符号。


Folded cumulative distribution

折叠累积分布 Folded Cumulative Distribution

F_{\mathbf{X}}(\mathbf{x}) = \operatorname{P}(X_1 \leq x_1,\ldots,X_N \leq x_n)

F _ { mathbf { x }(mathbf { x }) = 操作数名{ p }(x _ 1 leq x _ 1,ldots,x _ n leq x _ n)

文件:Folded-cumulative-distribution-function.svg
Example of the folded cumulative distribution for a normal distribution function with an expected value of 0 and a standard deviation of 1.

While the plot of a cumulative distribution often has an S-like shape, an alternative illustration is the folded cumulative distribution or mountain plot, which folds the top half of the graph over,[5][6]

Every multivariate CDF is:

每一个多变量CDF都是。

thus using two scales, one for the upslope and another for the downslope. This form of illustration emphasises the median and dispersion (specifically, the mean absolute deviation from the median[7]) of the distribution or of the empirical results.


The generalization of the cumulative distribution function from real to complex random variables is not obvious because expressions of the form P(Z \leq 1+2i) make no sense. However expressions of the form P(\Re{(Z)} \leq 1, \Im{(Z)} \leq 3) make sense. Therefore, we define the cumulative distribution of a complex random variables via the joint distribution of their real and imaginary parts:

累积分布函数从实型随机变量到复型随机变量的泛化并不明显,因为P(Z \leq 1+2i)这种形式的表达式没有意义。然而,P(Re{(Z)}形式的表达式 \leq 1, Im{(Z)} \leq 3)有意义。因此,我们通过其实部和虚部的联合分布来定义一个复杂随机变量的累积分布。

Inverse distribution function (quantile function)

逆分布函数(分位数函数) Inverse Distribution Function (quantile function)

F_Z(z)=F_{\Re{(Z)},\Im{(Z)}}(\Re{(z)},\Im{(z)})=P(\Re{(Z)} \leq \Re{(z)} , \Im{(Z)} \leq \Im{(z)}) .

F _ z (z) = f _ { Re {(z)} ,Im {(z)}(Re {(z)} ,Im {(z)}) = p (Re {(z)} leq Re {(z)} ,Im {(z)} leq Im {(z)}).

If the CDF F is strictly increasing and continuous then [math]\displaystyle{ F^{-1}( p ), p \in [0,1], }[/math] is the unique real number [math]\displaystyle{ x }[/math] such that [math]\displaystyle{ F(x) = p }[/math]. In such a case, this defines the inverse distribution function or quantile function.


Generalization of yields

产量的普及 Some distributions do not have a unique inverse (for example in the case where [math]\displaystyle{ f_X(x)=0 }[/math] for all [math]\displaystyle{ a\lt x\lt b }[/math], causing [math]\displaystyle{ F_X }[/math] to be constant). This problem can be solved by defining, for [math]\displaystyle{ p \in [0,1] }[/math], the generalized inverse distribution function:

F_{\mathbf{Z}}(\mathbf{z}) = F_{\Re{(Z_1)},\Im{(Z_1)}, \ldots, \Re{(Z_n)},\Im{(Z_n)}}(\Re{(z_1)}, \Im{(z_1)},\ldots,\Re{(z_n)}, \Im{(z_n)}) = \operatorname{P}(\Re{(Z_1)} \leq \Re{(z_1)},\Im{(Z_1)} \leq \Im{(z_1)},\ldots,\Re{(Z_n)} \leq \Re{(z_n)},\Im{(Z_n)} \leq \Im{(z_n)})


[math]\displaystyle{ as definition for the CDS of a complex random vector \mathbf{Z} = (Z_1,\ldots,Z_N)^T. 作为复随机向量 mathbf { z } = (z1,ldots,zn) ^ t 的 CDS 的定义。 F^{-1}(p) = \inf \{x \in \mathbb{R}: F(x) \geq p \}. }[/math]
  • Example 1: The median is [math]\displaystyle{ F^{-1}( 0.5 ) }[/math].

The concept of the cumulative distribution function makes an explicit appearance in statistical analysis in two (similar) ways. Cumulative frequency analysis is the analysis of the frequency of occurrence of values of a phenomenon less than a reference value. The empirical distribution function is a formal direct estimate of the cumulative distribution function for which simple statistical properties can be derived and which can form the basis of various statistical hypothesis tests. Such tests can assess whether there is evidence against a sample of data having arisen from a given distribution, or evidence against two samples of data having arisen from the same (unknown) population distribution.

累积分布函数的概念在统计分析中以两种(类似)方式明确出现。累积频率分析是分析现象的值小于参考值的出现频率。经验分布函数是累积分布函数的形式上的直接估计,对它可以得出简单的统计特性,它可以成为各种统计假设检验的基础。这种检验可以评估是否有证据表明一个数据样本来自于一个给定的分布,或有证据表明两个数据样本来自于同一个(未知)人口分布。

  • Example 2: Put [math]\displaystyle{ \tau = F^{-1}( 0.95 ) }[/math]. Then we call [math]\displaystyle{ \tau }[/math] the 95th percentile.


Some useful properties of the inverse cdf (which are also preserved in the definition of the generalized inverse distribution function) are:

The Kolmogorov–Smirnov test is based on cumulative distribution functions and can be used to test to see whether two empirical distributions are different or whether an empirical distribution is different from an ideal distribution. The closely related Kuiper's test is useful if the domain of the distribution is cyclic as in day of the week. For instance Kuiper's test might be used to see if the number of tornadoes varies during the year or if sales of a product vary by day of the week or day of the month.

科尔莫戈洛夫-斯米尔诺夫检验 Kolmogorov–Smirnov test 是以累积分布函数为基础的,可以用来检验两个经验分布是否不同,或者一个经验分布是否与理想分布不同。与之密切相关的Kuiper's检验在分布域是周期性的如星期几的情况下很有用。例如,Kuiper's 检验可以用来观察龙卷风的数量是否在一年中变化,或者产品的销售是否在一周中的某一天或一个月中的某一天变化。


  1. [math]\displaystyle{ F^{-1} }[/math] is nondecreasing
  1. [math]\displaystyle{ F^{-1}(F(x)) \leq x }[/math]
  1. [math]\displaystyle{ F(F^{-1}(p)) \geq p }[/math]
  1. [math]\displaystyle{ F^{-1}(p) \leq x }[/math] if and only if [math]\displaystyle{ p \leq F(x) }[/math]
  1. If [math]\displaystyle{ Y }[/math] has a [math]\displaystyle{ U[0, 1] }[/math] distribution then [math]\displaystyle{ F^{-1}(Y) }[/math] is distributed as [math]\displaystyle{ F }[/math]. This is used in random number generation using the inverse transform sampling-method.
  1. If [math]\displaystyle{ \{X_\alpha\} }[/math] is a collection of independent [math]\displaystyle{ F }[/math]-distributed random variables defined on the same sample space, then there exist random variables [math]\displaystyle{ Y_\alpha }[/math] such that [math]\displaystyle{ Y_\alpha }[/math] is distributed as [math]\displaystyle{ U[0,1] }[/math] and [math]\displaystyle{ F^{-1}(Y_\alpha) = X_\alpha }[/math] with probability 1 for all [math]\displaystyle{ \alpha }[/math].


The inverse of the cdf can be used to translate results obtained for the uniform distribution to other distributions.


Empirical distribution function

经验分布函数

The empirical distribution function is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function[citation needed].


Multivariate case

多变量情况

Definition for two random variables

两个随机变量的定义

When dealing simultaneously with more than one random variable the joint cumulative distribution function can also be defined. For example, for a pair of random variables [math]\displaystyle{ X,Y }[/math], the joint CDF [math]\displaystyle{ F_{XY} }[/math] is given by[2]:p. 89

Category:Functions related to probability distributions

类别:概率分布相关函数


This page was moved from wikipedia:en:Cumulative distribution function. Its edit history can be viewed at 分布函数/edithistory

  1. Deisenroth, Marc Peter; Faisal, A. Aldo; Ong, Cheng Soon (2020). Mathematics for Machine Learning. Cambridge University Press. pp. 181. ISBN 9781108455145. https://github.com/mml-book/mml-book.github.io. 
  2. 2.0 2.1 2.2 2.3 2.4 2.5 Park, Kun Il (2018). Fundamentals of Probability and Stochastic Processes with Applications to Communications. Springer. ISBN 978-3-319-68074-3. 
  3. "Z Table". Z Table (in English). Retrieved 2019-12-11.
  4. Zwillinger, Daniel; Kokoska, Stephen (2010). CRC Standard Probability and Statistics Tables and Formulae. CRC Press. p. 49. ISBN 978-1-58488-059-2. 
  5. Gentle, J.E. (2009). Computational Statistics. Springer. ISBN 978-0-387-98145-1. https://books.google.com/?id=m4r-KVxpLsAC&pg=PA348. Retrieved 2010-08-06. 模板:Page needed
  6. Monti, K. L. (1995). "Folded Empirical Distribution Function Curves (Mountain Plots)". The American Statistician. 49 (4): 342–345. doi:10.2307/2684570. JSTOR 2684570.
  7. {{Cite journal Monotonically non-decreasing for each of its variables, 对它的每一个变量都是单调的非递减的。 | last1 = Xue | first1 = J. H. Right-continuous in each of its variables, 每一个变量都是右连续的。 | last2 = Titterington | first2 = D. M. 0\leq F_{X_1 \ldots X_n}(x_1,\ldots,x_n)\leq 1, 0 leq f { x _ 1 ldots x _ n }(x _ 1,ldots,x _ n) leq 1, | doi = 10.1016/j.spl.2011.03.014 \lim_{x_1,\ldots,x_n \rightarrow+\infty}F_{X_1 \ldots X_n}(x_1,\ldots,x_n)=1 \text{ and } \lim_{x_i\rightarrow-\infty}F_{X_1 \ldots X_n}(x_1,\ldots,x_n)=0, \text{for all } i. lim_x_1,ldots,xn righttarrow + infty } f { x_1 ldots x _ n }(x_1,ldots,xn) = 1 text { and } lim_x_i right tarrow-infty } f { x_1 ldots x _ n }(x_1,ldots,xn) = 0,text { for all } i. | title = The p-folded cumulative distribution function and the mean absolute deviation from the p-quantile | journal = Statistics & Probability Letters The probability that a point belongs to a hyperrectangle is analogous to the 1-dimensional case: 一个点属于超矩形的概率类似于一维的情况。 | volume = 81 | issue = 8 | pages = 1179–1182 F_{X_1,X_2}(a, c) + F_{X_1,X_2}(b, d) - F_{X_1,X_2}(a, d) - F_{X_1,X_2}(b, c) = \operatorname{P}(a < X_1 \leq b, c < X_2 \leq d) = \int ... F _ { x _ 1,x _ 2}(a,c) + f _ { x _ 1,x _ 2}(b,d)-f _ { x _ 1,x _ 2}(a,d)-f _ { x _ 1,x _ 2}(b,c) = 操作者名{ p }(a < x _ 1 leq b,c < x _ 2 leq d) = int..。 | year = 2011 | pmid = | pmc = | url = https://hal.archives-ouvertes.fr/hal-00753950/file/PEER_stage2_10.1016%252Fj.spl.2011.03.014.pdf }}