更改

概率分布 (查看源代码)

2020年9月13日 (日) 08:43的版本

删除3,087字节、 2020年9月13日 (日) 08:43

无编辑摘要

第1行：第1行： −

~~此词条暂由彩云小译翻译，未经人工整理和审校，带来阅读不便，请见谅。~~

+

本词条由Ryan初步翻译

{{short description|Mathematical function that describes the probability of occurrence of different possible outcomes in an experiment}}

第13行：第13行：

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events (subsets of the sample space).

−

~~在概率论和统计学中，概率分布是一个数学函数，它给出了一个实验出现不同可能结果的概率。它是根据样本空间和事件概率~~(样本空间的子集)对随机现象的数学描述。

+

在概率论和统计学中，概率分布是一个给出一个实验不同可能结果出现的概率的数学函数。它是根据样本空间和事件概率(样本空间的子集)对随机现象的数学描述。

第19行：第19行：

For instance, if the [[random variable]] {{mvar|X}} is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of {{mvar|X}} would take the value 0.5 for {{math|''X'' {{=}} heads}}, and 0.5 for {{math|''X'' {{=}} tails}} (assuming the coin is fair). Examples of random phenomena include the weather condition in a future date, the height of a person, the fraction of male students in a school, the results of a [[Survey methodology|survey]], etc.<ref name="ross">{{cite book|first=Sheldon M.|last=Ross|title=A first course in probability|publisher=Pearson|year=2010}}</ref>

−

For instance, if the random variable is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of would take the value 0.5 for heads}}, and 0.5 for tails}} (assuming the coin is fair). Examples of random phenomena include the weather condition in a future date, the height of a person, the fraction of male students in a school, the results of a survey, etc.

+

For instance, if the random variable X is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of X would take the value 0.5 for X=heads, and 0.5 for X=tails (assuming the coin is fair). Examples of random phenomena include the weather condition in a future date, the height of a person, the fraction of male students in a school, the results of a survey, etc.

−

例如，如果使用随机变量来表示掷硬币的结果(“实验”) ~~，那么概率分布将取正面}的值0~~.~~5，反面}的值0~~.5(假设硬币是公平的)。随机现象的例子包括未来某一天的天气状况、一个人的身高、学校中男生的比例、调查结果等等。

+

例如，如果使用随机变量来表示掷硬币的结果(“实验”) ，那么硬币为正面的概率分布为0.5，反面的值0.5(假设硬币是公平的)。随机现象的例子包括未来某一天的天气状况、一个人的身高、学校中男生的比例、调查结果等等。

第29行：第29行：

A probability distribution is a mathematical function that has a sample space as its input, and gives a probability as its output. The sample space is the set of all possible outcomes of a random phenomenon being observed; it may be the set of real numbers or a set of vectors, or it may be a list of non-numerical values. For example, the sample space of a coin flip would be .

−

概率分布是一个数学函数，它的输入是一个样本空间，输出是一个概率。样本空间是所观察到的随机现象的所有可能结果的集合; ~~它可能是一组实数或一组向量，也可能是一组非数值。例如，抛硬币的样本空间是。~~

+

概率分布是一个数学函数，它的输入是一个样本空间，输出是一个概率。样本空间是所观察到的随机现象的所有可能结果的集合; 它可能是一组实数或一组向量，也可能是一组非数值。例如，抛硬币的样本空间是{头，尾}。

第37行：第37行：

Probability distributions are generally divided into two classes. A discrete probability distribution is applicable to the scenarios where the set of possible outcomes is discrete (e.g. a coin toss or the roll of a dice), and the probabilities are here encoded by a discrete list of the probabilities of the outcomes, known as the probability mass function. On the other hand, continuous probability distributions are applicable to scenarios where the set of possible outcomes can take on values in a continuous range (e.g. real numbers), such as the temperature on a given day. In this case, probabilities are typically described by a probability density function. The normal distribution is a commonly encountered continuous probability distribution. More complex experiments, such as those involving stochastic processes defined in continuous time, may demand the use of more general probability measures.

−

概率分布一般分为两类。离散概率分布适用于一组可能的结果是离散的情况。抛硬币或掷骰子) ，这里的概率被编码为结果概率的离散列表，称为概率质量函数。另一方面，连续概率分布适用于一组可能的结果可以在一个连续的范围内取值的情况(例如:~~。实数~~) ，例如某一天的温度。在这种情况下，概率通常由概率密度函数描述。正态分布是一种常见的连续概率分布。更复杂的实验，例如那些涉及连续时间定义的随机过程的实验，可能需要使用更一般的概率测度。

+

概率分布一般分为两类。离散概率分布适用于一组可能的结果是离散的情况，如抛硬币或掷骰子。这里的概率被编码为结果概率的离散列表，称为概率质量函数。另一方面，连续概率分布适用于一组可以在一个连续的范围内取值的结果的情况(例如:实数)，例如某一天的温度。在这种情况下，概率通常由概率密度函数描述。正态分布是一种常见的连续概率分布。更复杂的实验，例如那些涉及连续时间定义的随机过程的实验，可能需要使用更一般的概率测度。

第45行：第45行：

A probability distribution whose sample space is one-dimensional (for example real numbers, list of labels, ordered labels or binary) is called univariate, while a distribution whose sample space is a vector space of dimension 2 or more is called multivariate. A univariate distribution gives the probabilities of a single random variable taking on various alternative values; a multivariate distribution (a joint probability distribution) gives the probabilities of a random vector – a list of two or more random variables – taking on various combinations of values. Important and commonly encountered univariate probability distributions include the binomial distribution, the hypergeometric distribution, and the normal distribution. The multivariate normal distribution is a commonly encountered multivariate distribution.

−

~~一个样本空间为一维~~(例如实数、标签列表、有序标签或二进制)的概率分布被称为单变量，而样本空间为二维或更多向量空间的分布被称为多变量。单变量分布给出了单个随机变量取不同替代值的概率; 联合分布联合分布给出了一个随机向量的概率——一个由两个或多个随机变量组成的列表——取值的各种组合。重要的和常见的单变量概率分布包括二项分布、超几何分布和正态分布。多变量正态分布是一种常见的联合分布。

+

一个一维的样本空间(例如实数、标签列表、有序标签或二进制)的概率分布被称为单变量，而样本空间为二维或更多向量空间的分布被称为多变量。单变量分布给出了单个随机变量取不同替代值的概率; 联合分布给出了一个随机向量的概率——一个由两个或多个随机变量组成的列表——取值的各种组合。重要的和常见的单变量概率分布包括二项分布、超几何分布和正态分布。多变量正态分布是一种常见的联合分布。

−

==Introduction==

+

==Introduction 简介==

−

[[File:Dice Distribution (bar).svg|thumb|250px|right|The [[probability mass function]] (pmf) ''p''(''S'') specifies the probability distribution for the sum ''S'' of counts from two [[dice]]. For example, the figure shows that ''p''(11) = 2/36 = 1/18. The pmf allows the computation of probabilities of events such as ''P''(''S'' > 9) = 1/12 + 1/18 + 1/36 = 1/6, and all other probabilities in the distribution.]]

+

[[File:Dice Distribution (bar).svg|thumb|250px|right|

+

图1:The [[probability mass function]] (pmf) ''p''(''S'') specifies the probability distribution for the sum ''S'' of counts from two [[dice]]. For example, the figure shows that ''p''(11) = 2/36 = 1/18. The pmf allows the computation of probabilities of events such as ''P''(''S'' > 9) = 1/12 + 1/18 + 1/36 = 1/6, and all other probabilities in the distribution.

+

概率质量函数(pmf) p(s)指定两个骰子计数总和s的概率分布。例如，图中显示 p (11) = 2/36 = 1/18。Pmf 允许计算事件的概率，如 p (s > 9) = 1/12 + 1/18 + 1/36 = 1/6，以及分布中的所有其他概率。

The [[probability mass function (pmf) p(S) specifies the probability distribution for the sum S of counts from two dice. For example, the figure shows that p(11) = 2/36 = 1/18. The pmf allows the computation of probabilities of events such as P(S > 9) = 1/12 + 1/18 + 1/36 = 1/6, and all other probabilities in the distribution.]]

−

[[概率质量函数(pmf) p (s)]指定两个骰子计数总和 s 的概率分布。例如，图中显示 p (11) = 2/36 = 1/18。Pmf 允许计算事件的概率，如 p (s > 9) = 1/12 + 1/18 + 1/36 = 1/6，以及分布中的所有其他概率。]

第63行：第64行：

To define probability distributions for the simplest cases, it is necessary to distinguish between discrete and continuous random variables. In the discrete case, it is sufficient to specify a probability mass function p assigning a probability to each possible outcome: for example, when throwing a fair die, each of the six values 1 to 6 has the probability 1/6. The probability of an event is then defined to be the sum of the probabilities of the outcomes that satisfy the event; for example, the probability of the event "the dice rolls an even value" is

−

为了定义最简单的概率分布，有必要区分离散和连续的随机变量。在离散情况下，指定一个概率质量函数 p 就足够了，它为每个可能的结果赋予一个概率: ~~例如，当投掷一个公平骰子时，6个值中的每一个的概率为1到6。然后将事件的概率定义为满足事件的结果的概率之和~~; 例如，事件”骰子掷出偶数值”的概率是

+

为了定义最简单的概率分布，有必要区分离散和连续的随机变量。在离散情况下，指定一个概率质量函数 p 就足够了，它为每个可能的结果赋予一个概率: 例如，当投掷一个骰子时，6个值中的每一个的概率为1/6。然后将事件的概率定义为满足事件的结果的概率之和; 例如，事件”骰子掷出偶数值”的概率是

−

:<math>p(2) + p(4) + p(6) = 1/6+1/6+1/6=1/2.</math>

+

−

~~p(2) + p(4) + p(6) = 1/6+1/6+1/6=1/2.~~

−

~~P (2) + p (4) + p (6) = 1/6 + 1/6 + 1/6 = 1/2.~~

第77行：第74行：

In contrast, when a random variable takes values from a continuum then typically, any individual outcome has probability zero and only events that include infinitely many outcomes, such as intervals, can have positive probability. For example, the probability that a given object weighs exactly 500 g is zero, because the probability of measuring exactly 500 g tends to zero as the accuracy of our measuring instruments increases. Nevertheless, in quality control one might demand that the probability of a "500 g" package containing between 490 g and 510 g should be no less than 98%, and this demand is less sensitive to the accuracy of measurement instruments.

−

相比之下，当一个随机变量从一个连续体中取值时，那么通常情况下，任何单个结果的概率为零，只有包含无限多个结果的事件，例如间隔，才有正的概率。例如，一个给定的物体重量正好是500克的概率为零，因为随着我们测量仪器精度的提高，正好测量500克的概率趋向于零。然而，在质量控制方面，人们可能会要求包装在490克至510克之间的“500克”包装的可能性不低于98% ，而这一要求对测量仪器的准确性不太敏感。

+

相比之下，当一个随机变量从一个连续体中取值时，那么通常情况下，任何单个结果的概率都为零，只有包含无限多个结果的事件，例如间隔，才有正的概率。例如，一个给定的物体重量正好是500克的概率为零，因为随着我们测量仪器精度的提高，正好测量500克的概率趋向于零。然而，在质量控制方面，人们可能会要求包装在490克至510克之间的“500克”包装的可能性不低于98% ，而这一要求对测量仪器的准确性不太敏感。

第85行：第82行：

Continuous probability distributions can be described in several ways. The probability density function describes the infinitesimal probability of any given value, and the probability that the outcome lies in a given interval can be computed by integrating the probability density function over that interval. The probability that the possible values lie in some fixed interval can be related to the way sums converge to an integral; therefore, continuous probability is based on the definition of an integral.

−

连续概率分布可以用几种方法来描述。概率密度函数概率描述了任意给定值的无穷小概率，并且结果在给定区间内的概率可以通过在该区间上积分概率密度函数概率来计算。可能值位于某一固定区间的概率可以与和收敛于积分的方式有关，因此，连续概率是基于积分的定义。

+

连续概率分布可以用几种方法来描述。概率密度函数描述了任意给定值的无穷小概率，并且结果在给定区间内的概率可以通过在该区间上积分概率密度函数来计算。可能值位于某一固定区间的概率可以与和收敛于积分的方式有关，因此，连续概率是基于积分的定义。

−

[[File:Combined Cumulative Distribution Graphs.png|thumb|455x455px|On the left is the probability density function. On the right is the cumulative distribution function, which is the area under the probability density curve.]]

−

On the left is the probability density function. On the right is the cumulative distribution function, which is the area under the probability density curve.

+

[[File:Combined Cumulative Distribution Graphs.png|thumb|455x455px|

+

图2:On the left is the probability density function. On the right is the cumulative distribution function, which is the area under the probability density curve.

+

左边是概率密度函数。右边是累积分布函数，它是概率密度曲线下面的区域。]]

−

~~左边是概率密度函数。右边是累积分布函数，它是概率密度曲线下面的区域。~~

The [[cumulative distribution function]] describes the probability that the random variable is no larger than a given value; the probability that the outcome lies in a given interval can be computed by taking the difference between the values of the cumulative distribution function at the endpoints of the interval. The cumulative distribution function is the [[antiderivative]] of the probability density function provided that the latter function exists. The cumulative distribution function is the area under the [[probability density function]] from minus infinity <math>\infty</math> to <math>x</math> as described by the picture to the right.<ref>{{Cite book|title=A modern introduction to probability and statistics : understanding why and how|date=2005|publisher=Springer|others=Dekking, Michel, 1946-|isbn=978-1-85233-896-1|location=London|oclc=262680588}}</ref>

第97行：第93行：

The cumulative distribution function describes the probability that the random variable is no larger than a given value; the probability that the outcome lies in a given interval can be computed by taking the difference between the values of the cumulative distribution function at the endpoints of the interval. The cumulative distribution function is the antiderivative of the probability density function provided that the latter function exists. The cumulative distribution function is the area under the probability density function from minus infinity \infty to x as described by the picture to the right.

−

累积分布函数指标描述了随机变量不大于给定值的概率; 结果在给定区间内的概率可以通过计算区间终点的累积分布函数差来计算。累积分布函数函数是概率密度函数函数的反导函数，前提是后者存在。正如右边图片所描述的那样，累积分布函数是从负无穷到 x 的概率密度函数下面的区域。

+

累积分布函数指标描述了随机变量不大于给定值的概率; 结果在给定区间内的概率可以通过计算区间终点的累积分布函数差来计算。累积分布函数是概率密度函数的反导函数，前提是后者存在。正如右边图片所描述的那样，累积分布函数是从负无穷到 x 的概率密度函数下面的区域。

−

[[File:Standard deviation diagram.svg|right|thumb|250px|The [[probability density function]] (pdf) of the [[normal distribution]], also called Gaussian or "bell curve", the most important continuous random distribution. As notated on the figure, the probabilities of intervals of values correspond to the area under the curve.]]

−

The [[probability density function (pdf) of the normal distribution, also called Gaussian or "bell curve", the most important continuous random distribution. As notated on the figure, the probabilities of intervals of values correspond to the area under the curve.]]

+

[[File:Standard deviation diagram.svg|right|thumb|250px|

+

图3:The [[probability density function]] (pdf) of the [[normal distribution]], also called Gaussian or "bell curve", the most important continuous random distribution. As notated on the figure, the probabilities of intervals of values correspond to the area under the curve.

+

正态分布的[概率密度函数(pdf) ，也称为高斯或钟形曲线，是最重要的连续随机分布。如图所示，值间隔的概率对应于曲线下面积。]]

−

~~正态分布的[概率密度函数(pdf) ，也称为高斯或钟形曲线，最重要的连续随机分布。如图所示，值间隔的概率对应于曲线下面积。]~~

−

== Terminology ==

+

== Terminology 术语==

第117行：第112行：

Some key concepts and terms, widely used in the literature on the topic of probability distributions, are listed below. the regions close to the bounds of the random variable, if the pmf or pdf are relatively low therein. Usually has the form X > a, X < b or a union thereof.

−

一些关键的概念和术语，广泛用于文献的主题概率分布，列出如下。接近随机变量界限的区域，如果 pmf 或 pdf 相对较低的话。通常具有形式 x > a，x < b 或它们的结合。

+

一些关键的，广泛用于以概率分布为主题的文献中的概念和术语，列出如下。

−

=== Functions for discrete variables ===

+

=== Functions for discrete variables 离散变量的函数===

第127行：第121行：

Well-known discrete probability distributions used in statistical modeling include the Poisson distribution, the Bernoulli distribution, the binomial distribution, the geometric distribution, and the negative binomial distribution.

−

~~用于统计建模的著名离散概率分布包括泊松分佈、伯努利分布、二项分布、几何分佈和负二项分布。~~

+

用于统计建模的著名离散概率分布包括泊松分佈、伯努利分布、二项分布、几何分布和负二项分布。

*'''Probability function''': describes the probability distribution of a discrete random variable.

+

概率函数：描述离散随机变量的概率分布。

*'''[[Probability mass function|Probability mass function (pmf)]]:''' function that gives the probability that a discrete random variable is equal to some value.

+

概率质量函数（pmf）：给出离散随机变量等于某个值的概率的函数。

*'''[[Frequency distribution]]''': a table that displays the frequency of various outcomes '''in a sample'''.

+

频率分布：显示样本中各种结果的频率的表格。

For a discrete random variable X, let u0, u1, ... be the values it can take with non-zero probability. Denote

第140行：第137行：

*'''Relative frequency distribution''': a [[frequency distribution]] where each value has been divided (normalized) by a number of outcomes in a [[Sample (statistics)|sample]] i.e. sample size.

+

相对频率分布：一种频率分布，其中每个值均已被样本中的多个结果（即样本大小）除（归一化）。

*'''Discrete probability distribution function''': general term to indicate the way the total probability of 1 is distributed over '''all''' various possible outcomes (i.e. over entire population) for discrete random variable.

+

离散概率分布函数：通用术语，表示总概率1在离散随机变量的所有各种可能结果（即整个人群）中的分布方式。

\Omega_i=X^{-1}(u_i)= \{\omega: X(\omega)=u_i\},\, i=0, 1, 2, \dots

第148行：第147行：

*'''[[Cumulative distribution function]]''': function evaluating the [[probability]] that <math>X</math> will take a value less than or equal to <math>x</math> for a discrete random variable.

+

累积分布函数：该函数评估离散随机变量X取小于或等于x的值的概率。

*'''[[Categorical distribution]]''': for discrete random variables with a finite set of values.

+

分类分布：适用于具有有限值集的离散随机变量。

−

~~These are disjoint sets, and for such sets~~

−

~~这些是不相交的集合，对于这样的集合~~

−

~~=== Functions for continuous variables ===~~

−

~~P\left(\bigcup_i \Omega_i\right)=\sum_i P(\Omega_i)=\sum_i P(X=u_i)=1.~~

−

~~P 左(bigcup _ i Omega _ i 右)~~ = ~~sum _ i p (Omega _ i)~~ = ~~sum _ i p (x~~ = ~~u _ i)~~ = 1。

+

=== Functions for continuous variables 连续变量函数===

* '''[[Probability density function]] (pdf):''' function whose value at any given sample (or point) in the [[sample space]] (the set of possible values taken by the random variable) can be interpreted as providing a ''relative likelihood'' that the value of the random variable would equal that sample.

+

概率密度函数（pdf）：可以将其在样本空间中任意给定样本（或点）上的值（随机变量可能获得的一组值）的值解释为提供随机变量值将具有的相对可能性的函数等于那个样本。

It follows that the probability that X takes any value except for u0, u1, ... is zero, and thus one can write X as

第172行：第166行：

* '''Continuous probability distribution function''': most often reserved for continuous random variables.

+

连续概率分布函数：最常保留的连续随机变量。

* '''[[Cumulative distribution function]]''': function evaluating the [[probability]] that <math>X</math> will take a value less than or equal to <math>x</math> for continuous variable.

−

+

累积分布函数：评估连续变量X取小于或等于x的值的概率的函数。

−

~~X(\omega)=\sum_i u_i 1_{\Omega_i}(\omega)~~

−

~~X (Omega) = sum _ i u _ i 1{ Omega _ i }(Omega)~~

−

~~===Basic terms===~~

−

~~except on a set of probability zero, where 1_A is the indicator function of A. This may serve as an alternative definition of discrete random variables.~~

+

===Basic terms 基本术语===

−

~~除了在一组概率为零的情况下，其中1 a 是 a 的指示函数。这可以作为离散随机变量的另一种定义。~~

* [[Mode (statistics)|'''Mode''']]: for a discrete random variable, the value with highest probability; for a continuous random variable, a location at which the probability density function has a local peak.

+

模式：对于离散随机变量，该值具有最高概率；对于连续随机变量，是概率密度函数具有局部峰值的位置。

* [[Support (mathematics)|'''Support''']]: set of values that can be assumed, with non-zero probability, by the random variable.

+

支持：可以由随机变量以非零概率假定的一组值。对于随机变量X，有时表示为R_ {X}

* '''Tail''':<ref name='tail'>More information and examples can be found in the articles [[Heavy-tailed distribution]], [[Long-tailed distribution]], [[fat-tailed distribution]]</ref> the regions close to the bounds of the random variable, if the pmf or pdf are relatively low therein. Usually has the form <math>X > a</math>, <math>X < b</math> or a union thereof.

+

尾巴：如果pmf或pdf相对较低，则靠近随机变量边界的区域。通常形式为X> a，X <b或它们的并集。

*'''Head''':<ref name='tail' /> the region where the pmf or pdf is relatively high. Usually has the form <math>a < X < b</math>.

+

头部：pmf或pdf较高的区域。通常具有a <X <b的形式

* '''[[Expected value]]''' or '''mean''': the [[weighted average]] of the possible values, using their probabilities as their weights; or the continuous analog thereof.

+

期望值或均值：可能值的加权平均值，以其概率作为权重；或其连续类似物。

A continuous probability distribution is a probability distribution whose support is an uncountable set, such as an interval in the real line. They are uniquely characterized by a cumulative density function that can be used to calculate the probability for each subset of the support. There are many examples of continuous probability distributions: normal, uniform, chi-squared, and others.

第202行：第196行：

*'''[[Median]]''': the value such that the set of values less than the median, and the set greater than the median, each have probabilities no greater than one-half.

+

中位数：这样的值，即一组值小于中位数，而该组大于中位数，每一个的概率不大于二分之一。

*'''[[Variance]]''': the second moment of the pmf or pdf about the mean; an important measure of the [[Statistical dispersion|dispersion]] of the distribution.

+

方差：关于均值的pmf或pdf的第二矩；分布的重要指标。

−

A random variable X has a continuous probability distribution if there is a function f: \mathbb{R} \rightarrow [0, \infty) such that for each interval I \subset \mathbb{R} the probability of X belonging to I is given by the integral of f over I. For example, if I = [a, b] then we would have:

−

如果有一个函数 f: mathbb { r } right tarrow [0，infty) ，对于每个区间 i 子集，x 属于 i 的概率由 f 在 i 上的积分给出，则 x 有一个连续的概率分布。例如，如果 i = [ a，b ] ，那么我们将得到:

* '''[[Standard deviation]]''': the square root of the variance, and hence another measure of dispersion.

+

标准偏差：方差的平方根，因此是色散的另一种度量。

−

~~<math>~~

−

~~《数学》~~

* [[Symmetric probability distribution|'''Symmetry''']]: a property of some distributions in which the portion of the distribution to the left of a specific value(usually the median) is a mirror image of the portion to its right.

−

+

对称性：某些分布的一种属性，其中特定值左侧（通常是中位数）的分布部分是其右侧部分的镜像。

−

~~\operatorname{P}\left[a \le X \le b\right] = \int_a^b f(x) \, dx~~

−

~~操作符{ p }左[ a le x le b 右] = int _ a ^ b f (x) ，dx~~

*'''[[Skewness]]''': a measure of the extent to which a pmf or pdf "leans" to one side of its mean. The third [[standardized moment]] of the distribution.

−

+

偏度：衡量pmf或pdf在其均值的一侧“倾斜”的程度。分布的第三个标准化时刻。

−

~~</math>~~

−

数学

*'''[[Kurtosis]]''': a measure of the "fatness" of the tails of a pmf or pdf. The fourth standardized moment of the distribution.

−

+

峰度：pmf或pdf尾部“脂肪”的量度。分布的第四个标准化时刻。

−

In particular, the probability for X to take any single value a (that is a \le X \le a) is zero, because an integral with coinciding upper and lower limits is always equal to zero. A variable that satisfies the above is called continuous random variable. Its cumulative density function is defined as

−

特别地，x 取任何单个值 a (即 le x le a)的概率为零，因为一个与上下限重合的积分总是等于零。满足上述条件的变量称为连续随机变量。其累积密度函数定义为

−

~~<math>~~

−

~~《数学》~~

−

~~==Discrete probability distribution==~~

−

~~F(x) = \operatorname{P}\left[-\infty < X \le x\right] = \int_{-\infty}^x f(x) \, dx~~

−

~~F (x) = 操作者名{ p }左[-infty < x le x right ] = int _ {-infty } ^ x f (x) ，dx~~

−

~~{{See also|Probability mass function|Categorical distribution}}~~

−

~~</math>~~

−

数学

−

~~which, by this definition, has the properties:~~

+

==Discrete probability distribution 离散概率分布==

−

~~根据这个定义，它具有以下属性:~~

−

[[File:Discrete probability distrib.svg|right|thumb|The probability mass function of a discrete probability distribution. The probabilities of the [[Singleton (mathematics)|singleton]]s {1}, {3}, and {7} are respectively 0.2, 0.5, 0.3. A set not containing any of these points has probability zero.]]

+

[[File:Discrete probability distrib.svg|right|thumb|

+

图4:The probability mass function of a discrete probability distribution. The probabilities of the [[Singleton (mathematics)|singleton]]s {1}, {3}, and {7} are respectively 0.2, 0.5, 0.3. A set not containing any of these points has probability zero.]]

<ul>

第263行：第226行：

< ul >

−

[[File:Discrete probability distribution.svg|right|thumb|The [[cumulative distribution function|cdf]] of a discrete probability distribution, ...]]

+

[[File:Discrete probability distribution.svg|right|thumb|

+

图5:The [[cumulative distribution function|cdf]] of a discrete probability distribution, ...离散概率分布]]

−

~~<li style="margin: 0.7rem 0;">F(x) is non-decreasing;</li>~~

−

~~“边际”0~~.~~7rem 0“ f (x)是不减的~~

+

[[File:Normal probability distribution.svg|right|thumb|

+

图6:... of a continuous probability distribution, ...连续概率分布]]

−

~~[[File:Normal probability distribution.svg|right|thumb|... of a continuous probability distribution, ...]]~~

−

~~<li style="margin: 0.7rem 0;">0 \le F(x) \le 1;</li>~~

+

[[File:Mixed probability distribution.svg|right|thumb|

−

+

图7:... of a distribution which has both a continuous part and a discrete part.既有连续部分又有离散部分]]

−

~~“利风” = “利润” : 0.7rem 0; “0” > 0 le f (x) le 1;~~

−

[[File:Mixed probability distribution.svg|right|thumb|... of a distribution which has both a continuous part and a discrete part.]]

−

~~<li style="margin: 0.7rem 0;">\lim_{x \rightarrow -\infty} F(x) = 0 and \lim_{x \rightarrow \infty} F(x) = 1;</li>~~

−

~~< li style = " margin: 0.7rem 0; " > lim { x right tarrow-infty } f (x) = 0 and lim { x right tarrow infty } f (x) = 1; </li >~~

−

~~<li style="margin: 0.7rem 0;">P(a \le X < b) = F(b) - F(a); and</li>~~

−

~~< li style = " margin: 0.7rem 0; " > p (a le x < b) = f (b)-f (a) ; </li >~~

+

A '''discrete probability distribution''' is a probability distribution that can take on a countable number of values.<ref>{{Cite book|title=Probability and stochastics|last=Erhan|first=Çınlar|date=2011|publisher=Springer|isbn=9780387878591|location=New York|pages=51|oclc=710149819}}</ref> For the probabilities to add up to 1, they have to decline to zero fast enough. For example, if <math>\operatorname{P}(X=n) = \tfrac{1}{2^n}</math> for ''n'' = 1, 2, ..., the sum of probabilities would be 1/2 + 1/4 + 1/8 + ... = 1.

−

<~~li style="margin: 0.7rem 0;"~~>F(x) ~~is continuous (due to the Riemann integral properties).~~</li>

+

离散概率分布是可以具有可数数量的值的概率分布。在值的范围是无限大的情况下，这些值必须足够快地下降到零，以使概率加起来为1。例如，如果<math>\operatorname{P}(X=n) = \tfrac{1}{2^n}</math> for ''n'' = 1, 2，概率之和为1/2 + 1/4 + 1/8 + ... = 1。

−

~~< li style~~ = ~~" margin: 0.7rem 0; " > f (x)是连续的(由于黎曼积分属性)。 <~~/~~li >~~

−

</~~ul>~~

−

</~~ul >~~

Well-known discrete probability distributions used in statistical modeling include the [[Poisson distribution]], the [[Bernoulli distribution]], the [[binomial distribution]], the [[geometric distribution]], and the [[negative binomial distribution]].<ref name=":1" /> Additionally, the [[Uniform distribution (discrete)|discrete uniform distribution]] is commonly used in computer programs that make equal-probability random selections between a number of choices.

+

统计建模中使用的众所周知的离散概率分布包括泊松分布，伯努利分布，二项式分布，几何分布和负二项式分布。[3]此外，离散均匀分布通常用于在多个选择之间进行等概率随机选择的计算机程序中。

It is also possible to think in the opposite direction, which allows more flexibility. Say F(x) is a function that satisfies all but the last of the properties above, then F represents the cumulative density function for some random variable: a discrete random variable if F is a step function, and a continuous random variable otherwise. This allows for continuous distributions that has a cumulative density function, but not a probability density function, such as the Cantor distribution.

−

也可以朝相反的方向思考，这样可以有更大的灵活性。假设 f (x)是满足上述所有性质的函数，那么 f 表示某个随机变量的累积密度函数: 如果 f 是阶跃函数，则为离散随机变量，否则为连续随机变量。这允许具有累积密度函数的连续分布，而不是概率密度函数分布，例如 Cantor 分布。

+

也可以朝相反的方向思考，这样可以有更大的灵活性。假设 f(x)是满足上述所有性质的函数，那么 f 表示某个随机变量的累积密度函数: 如果 f 是阶跃函数，则为离散随机变量，否则为连续随机变量。这允许具有累积密度函数的连续分布，而不是概率密度函数分布，例如 Cantor 分布。

When a [[Sample (statistics)|sample]] (a set of observations) is drawn from a larger population, the sample points have an [[empirical distribution function|empirical distribution]] that is discrete and that provides information about the population distribution.

−

+

当从更大的总体中抽取一个样本（一组观察值）时，这些采样点的经验分布是离散的，并且提供了有关总体分布的信息。

−

It is often necessary to generalize the above definition for more arbitrary subsets of the real line. In these contexts, a continuous probability distribution is defined as a probability distribution with a cumulative distribution function that is absolutely continuous. Equivalently, it is a probability distribution on the real numbers that is absolutely continuous with respect to the Lebesgue measure. Such distributions can be represented by their probability density functions. If X is such an absolutely continuous random variable, then it has a probability density function f(x), and its probability of falling into a Lebesgue-measurable set A \subset \mathbb{R} is:

−

通常有必要将上述定义推广到实线的更多任意子集。在这些情况下，一个连续的概率分布被定义为一个具有绝对连续的概率分布累积分布函数。等价地，它是实数上的一个概率分布，对于勒贝格测度是绝对连续的。这种分布可以用它们的概率密度函数来表示。如果 x 是这样一个绝对连续的随机变量，那么它有一个概率密度函数 f (x) ，它落入 lebesgue 可测集 a 子集数学{ r }的概率是:

−

~~<math>~~

+

===Measure theoretic formulation 测量理论公式===

−

~~《数学》~~

−

===Measure theoretic formulation===

−

~~\operatorname{P}\left[X \in A\right] = \int_A f(x) \, d\mu~~

−

~~操作数名{ p }左[ x 在 a 右] = int _ a f (x) ，d mu~~

−

~~</math>~~

−

数学

A [[measurable function]] <math>X \colon A \to B </math> between a [[probability space]] <math>(A, \mathcal A, P)</math> and a [[measurable space]] <math>(B, \mathcal B) </math> is called a '''discrete random variable''' provided that its image is a countable set. In this case measurability of <math>X</math> means that the pre-images of singleton sets are measurable, i.e., <math>X^{-1}(\{b\}) \in \mathcal A</math> for all <math>b \in B</math>.

−

~~where \mu is the Lebesgue measure.~~

−

~~勒贝格测度在哪里。~~

−

The latter requirement induces a [[probability mass function]] <math>f_X \colon X(A) \to \mathbb R</math> via <math> f_X(b):=P(X^{-1}(\{b\}))</math>. Since the pre-images of disjoint sets

−

are disjoint,

−

Note on terminology: some authors use the term "continuous distribution" to denote distributions whose cumulative distribution functions are continuous, rather than absolutely continuous. These distributions are the ones \mu such that \mu\{x\}\,=\,0 for all \,x. This definition includes the (absolutely) continuous distributions defined above, but it also includes singular distributions, which are neither absolutely continuous nor discrete nor a mixture of those, and do not have a density. An example is given by the Cantor distribution.

−

关于术语的说明: 一些作者使用术语“连续分布”来表示分布，其累积分布函数是连续的，而不是绝对连续的。这些分布的 μ 使得 mu { x } ，= ，0 for all，x。这个定义包括上面定义的(绝对)连续分布，但也包括奇异分布，它既不是绝对连续的，也不是离散的，也不是这些分布的混合物，并且没有密度。给出了一个由 Cantor 分布表示的例子。

−

:<math>\sum_{b \in X(A)} f_X(b) = \sum_{b \in X(A)} P(X^{-1} (\{b\})) = P \left( \bigcup_{b \in X(A)} X^{-1}(\{b\}) \right) = P(A)=1.</math>

−

This recovers the definition given above.

Ryan

75

个编辑