随机变量

来自集智百科
跳到导航 跳到搜索

本词条由11初步翻译 https://wiki.swarma.org/index.php?title=%E5%B9%B3%E8%A1%A1%E7%90%86%E8%AE%BA#:~:text=%E6%9C%AC%E8%AF%8D%E6%9D%A1%E7%94%B1,11%E5%88%9D%E6%AD%A5%E7%BF%BB%E8%AF%91


模板:简述

模板:Probability fundamentals 模板:概率基本原理


In probability and statistics, a random variable, random quantity, aleatory variable, or stochastic variable is described informally as a variable whose values depend on outcomes of a random phenomenon.[1] The formal mathematical treatment of random variables is a topic in probability theory. In that context, a random variable is understood as a measurable function defined on a probability space that maps from the sample space to the real numbers.[2]

In probability and statistics, a random variable, random quantity, aleatory variable, or stochastic variable is described informally as a variable whose values depend on outcomes of a random phenomenon. The formal mathematical treatment of random variables is a topic in probability theory. In that context, a random variable is understood as a measurable function defined on a probability space that maps from the sample space to the real numbers.

在概率论和统计学中, 随机变量Random Variables、随机量、偶然变量被非正式地描述为其值取决于随机现象的结果的变量。随机变量的正式数学处理是概率论中的课题。在这种情况下,随机变量为一个定义在概率空间上的从样本空间映射到实数的可测函数。

文件:Random Variable as a Function-en.svg
This graph shows how random variable is a function from all possible outcomes to real values. It also shows how random variable is used for defining probability mass functions.
文件:Random Variable as a Function-en.svg
这个图表显示了随机变量是如何从所有可能的结果到实际值的函数。它还显示了随机变量如何用于定义概率质量函数

This graph shows how random variable is a function from all possible outcomes to real values. It also shows how random variable is used for defining probability mass functions.

这个图表显示了随机变量是如何从所有可能的结果到实际值的函数。它还显示了随机变量如何用于定义概率质量函数。

A random variable's possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the possible outcomes of a past experiment whose already-existing value is uncertain (for example, because of imprecise measurements or quantum uncertainty). They may also conceptually represent either the results of an "objectively" random process (such as rolling a die) or the "subjective" randomness that results from incomplete knowledge of a quantity. The meaning of the probabilities assigned to the potential values of a random variable is not part of probability theory itself, but is instead related to philosophical arguments over the interpretation of probability. The mathematics works the same regardless of the particular interpretation in use.

A random variable's possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the possible outcomes of a past experiment whose already-existing value is uncertain (for example, because of imprecise measurements or quantum uncertainty). They may also conceptually represent either the results of an "objectively" random process (such as rolling a die) or the "subjective" randomness that results from incomplete knowledge of a quantity. The meaning of the probabilities assigned to the potential values of a random variable is not part of probability theory itself, but is instead related to philosophical arguments over the interpretation of probability. The mathematics works the same regardless of the particular interpretation in use.

一个随机变量的可能值可能代表尚未进行的实验的可能结果,或者代表过去实验的可能结果,但其已有的值是不确定的(例如,由于测量不精确或量子的不确定性)。它们也可以在概念上代表 “客观 ”的随机过程(如掷骰子)的结果,或者是对一个量的不完全认识所导致的 “主观 ”随机性。分配随机变量潜在值的概率的意义并不是概率论本身的一部分,而是与概率解释的哲学争论有关。无论使用何种特定的解释,数学的作用都是一样的。


As a function, a random variable is required to be measurable, which allows for probabilities to be assigned to sets of its potential values. It is common that the outcomes depend on some physical variables that are not predictable. For example, when tossing a fair coin, the final outcome of heads or tails depends on the uncertain physical conditions, so the outcome being observed is uncertain. The coin could get caught in a crack in the floor, but such a possibility is excluded from consideration.

As a function, a random variable is required to be measurable, which allows for probabilities to be assigned to sets of its potential values. It is common that the outcomes depend on some physical variables that are not predictable. For example, when tossing a fair coin, the final outcome of heads or tails depends on the uncertain physical conditions, so the outcome being observed is uncertain. The coin could get caught in a crack in the floor, but such a possibility is excluded from consideration.

作为一个函数,随机变量必须是可测量的,这就允许为其潜在值的集合分配概率。结果常常取决于一些无法预测的物理变量,例如,当投掷一枚均匀的硬币时,最终结果是正面还是反面取决于不确定的物理条件,所以观察到的结果是不确定的。硬币可能会卡在地板的裂缝里,但这种可能性不在考虑范围之内。


The domain of a random variable is called a sample space. It is interpreted as the set of possible outcomes of a random phenomenon. For example, in the case of a coin toss, only two possible outcomes are considered, namely heads or tails.

The domain of a random variable is called a sample space. It is interpreted as the set of possible outcomes of a random phenomenon. For example, in the case of a coin toss, only two possible outcomes are considered, namely heads or tails.

随机变量的域称为 样本空间sample space。它被解释为一系列随机现象的可能结果。例如,在掷硬币的情况下,只考虑两种可能的结果,即正面或反面。


A random variable has a probability distribution, which specifies the probability of Borel subsets of its range. Random variables can be discrete, that is, taking any of a specified finite or countable list of values (having a countable range), endowed with a probability mass function that is characteristic of the random variable's probability distribution; or continuous, taking any numerical value in an interval or collection of intervals (having an uncountable range), via a probability density function that is characteristic of the random variable's probability distribution; or a mixture of both.

A random variable has a probability distribution, which specifies the probability of Borel subsets of its range. Random variables can be discrete, that is, taking any of a specified finite or countable list of values (having a countable range), endowed with a probability mass function that is characteristic of the random variable's probability distribution; or continuous, taking any numerical value in an interval or collection of intervals (having an uncountable range), via a probability density function that is characteristic of the random variable's probability distribution; or a mixture of both.

一个随机变量有一个概率分布,它规定了其范围内博尔子集的概率。随机变量可以是离散的,即在指定的有限或可数的数值列表中取任何数值(具有可数的范围),具有随机变量的概率分布特征的概率质量函数;也可以是连续的,通过具有随机变量的概率分布特征的概率密度函数在一个区间或区间集合中取任何数值(具有不可数的范围);或者两者的混合。


Two random variables with the same probability distribution can still differ in terms of their associations with, or independence from, other random variables. The realizations of a random variable, that is, the results of randomly choosing values according to the variable's probability distribution function, are called random variates.

Two random variables with the same probability distribution can still differ in terms of their associations with, or independence from, other random variables. The realizations of a random variable, that is, the results of randomly choosing values according to the variable's probability distribution function, are called random variates.

具有相同概率分布的两个随机变量,在与其他随机变量的关联性或独立性方面,仍然可以有所不同。随机变量的实现,也就是根据变量的概率分布函数随机选择数值的结果,称为随机变量。


Definition

定义


A random variable is a measurable function [math]\displaystyle{ X \colon \Omega \to E }[/math] from a set of possible outcomes [math]\displaystyle{ \Omega }[/math] to a measurable space [math]\displaystyle{ E }[/math]. The technical axiomatic definition requires [math]\displaystyle{ \Omega }[/math] to be a sample space of a probability triple [math]\displaystyle{ (\Omega, \mathcal{F}, \operatorname{P}) }[/math] (see the measure-theoretic definition). A random variable is often denoted by capital roman letters such as [math]\displaystyle{ X }[/math], [math]\displaystyle{ Y }[/math], [math]\displaystyle{ Z }[/math], [math]\displaystyle{ T }[/math].[3][4]

A random variable is a measurable function X \colon \Omega \to E from a set of possible outcomes \Omega to a measurable space E. The technical axiomatic definition requires \Omega to be a sample space of a probability triple (\Omega, \mathcal{F}, \operatorname{P}) (see the measure-theoretic definition). A random variable is often denoted by capital roman letters such as X, Y, Z, T.

一个随机变量是从一组可能的结果 ω 到一个可测量空间 E 的一个可测函数 x 冒号 ω 到 e。技术公理化定义要求 Omega 是一个概率三元组(Omega,mathcal { f } ,operatorname { p })的示例空间(参见度量理论定义)。随机变量通常用大写罗马字母表示,如 X,Y,Z,T。


The probability that [math]\displaystyle{ X }[/math] takes on a value in a measurable set [math]\displaystyle{ S\subseteq E }[/math] is written as

The probability that X takes on a value in a measurable set S\subseteq E is written as

在一个可测集合S子集E中,X取值的概率写为


[math]\displaystyle{ \operatorname{P}(X \in S) = \operatorname{P}(\{ \omega \in \Omega \mid X(\omega) \in S \}) }[/math][3]
\operatorname{P}(X \in S) = \operatorname{P}(\{ \omega \in \Omega \mid X(\omega) \in S \}) and its distribution is a discrete probability distribution, i.e. can be described by a probability mass function that assigns a probability to each value in the image of X. If the image is uncountably infinite (usually an interval) then X is called a continuous random variable. In the special case that it is absolutely continuous, its distribution can be described by a probability density function, which assigns probabilities to intervals; in particular, each individual point must necessarily have probability zero for an absolutely continuous random variable. Not all continuous random variables are absolutely continuous, a mixture distribution is one such counterexample; such random variables cannot be described by a probability density or a probability mass function.

{P}(XinS) ={P}({ Omega in Omega mid X (Omega) in S })并且它的分布是一个离散的概率分布,即。可以用一个给X的每个值赋一个概率的概率质量函数来描述。如果图像是不可数无限的(通常是一个区间)),那么X被称为连续随机变量。在绝对连续的特殊情况下,它的分布可以用一个概率概率密度函数来描述,它为区间赋值概率; 特别地,对于绝对连续的随机变量,每个单独的点必须具有零概率。不是所有的连续随机变量都是绝对连续的,混合分布就是这样一个反例; 这样的随机变量不能用概率密度或概率质量函数来描述。


Standard case

标准案例

Any random variable can be described by its cumulative distribution function, which describes the probability that the random variable will be less than or equal to a certain value.

任何随机变量都可以用它的累积分布函数来描述,它描述了随机变量小于或等于某个值的概率。


In many cases, [math]\displaystyle{ X }[/math] is real-valued, i.e. [math]\displaystyle{ E = \mathbb{R} }[/math]. In some contexts, the term random element (see extensions) is used to denote a random variable not of this form.


模板:AnchorWhen the image (or range) of [math]\displaystyle{ X }[/math] is countable, the random variable is called a discrete random variable[5]:399 and its distribution is a discrete probability distribution, i.e. can be described by a probability mass function that assigns a probability to each value in the image of [math]\displaystyle{ X }[/math]. If the image is uncountably infinite (usually an interval) then [math]\displaystyle{ X }[/math] is called a continuous random variable.[6] In the special case that it is absolutely continuous, its distribution can be described by a probability density function, which assigns probabilities to intervals; in particular, each individual point must necessarily have probability zero for an absolutely continuous random variable. Not all continuous random variables are absolutely continuous,[7] a mixture distribution is one such counterexample; such random variables cannot be described by a probability density or a probability mass function.

The term "random variable" in statistics is traditionally limited to the real-valued case (E=\mathbb{R}). In this case, the structure of the real numbers makes it possible to define quantities such as the expected value and variance of a random variable, its cumulative distribution function, and the moments of its distribution.

统计学中的术语“随机变量”一般仅限于实值情况(E= mathbb{R})。在这种情况下,实数的结构使得定义诸如随机变量的期望值和方差、它的累积分布函数和它的分布矩等量成为可能。


Any random variable can be described by its cumulative distribution function, which describes the probability that the random variable will be less than or equal to a certain value. 任何随机变量都可以用它的累积分布函数来描述,它描述的是随机变量小于或等于某一数值的概率。

However, the definition above is valid for any measurable space E of values. Thus one can consider random elements of other sets E, such as random boolean values, categorical values, complex numbers, vectors, matrices, sequences, trees, sets, shapes, manifolds, and functions. One may then specifically refer to a random variable of type E, or an E-valued random variable.

然而,上述定义对任何可测量的值空间E都是有效的。因此,我们可以考虑其他集合E的随机元素,如随机布尔值、分类值、复数、向量、矩阵、序列、树、集合、形状、显现和函数。随机变量可以具体地指 E类型的随机变量,或E值随机变量。


Extensions

延伸

This more general concept of a random element is particularly useful in disciplines such as graph theory, machine learning, natural language processing, and other fields in discrete mathematics and computer science, where one is often interested in modeling the random variation of non-numerical data structures. In some cases, it is nonetheless convenient to represent each element of E, using one or more real numbers. In this case, a random element may optionally be represented as a vector of real-valued random variables (all defined on the same underlying probability space \Omega, which allows the different random variables to covary). For example:

随机元素这一更为普遍的概念在图论、机器学习、自然语言处理等学科以及离散数学和计算机科学的其他领域尤其有用,在这些领域,人们通常对非数值数据结构的随机变化建模感兴趣。在某些情况下,使用一个或多个实数来表示E的每个元素还是很方便的。在这种情况下,一个随机元素可以选择性地表示为实值随机变量的向量(全部定义在同一个底层概率空间上,允许不同的随机变量共变)。例如:


The term "random variable" in statistics is traditionally limited to the real-valued case ([math]\displaystyle{ E=\mathbb{R} }[/math]). In this case, the structure of the real numbers makes it possible to define quantities such as the expected value and variance of a random variable, its cumulative distribution function, and the moments of its distribution.


However, the definition above is valid for any measurable space [math]\displaystyle{ E }[/math] of values. Thus one can consider random elements of other sets [math]\displaystyle{ E }[/math], such as random boolean values, categorical values, complex numbers, vectors, matrices, sequences, trees, sets, shapes, manifolds, and functions. One may then specifically refer to a random variable of type [math]\displaystyle{ E }[/math], or an [math]\displaystyle{ E }[/math]-valued random variable.


This more general concept of a random element is particularly useful in disciplines such as graph theory, machine learning, natural language processing, and other fields in discrete mathematics and computer science, where one is often interested in modeling the random variation of non-numerical data structures. In some cases, it is nonetheless convenient to represent each element of [math]\displaystyle{ E }[/math], using one or more real numbers. In this case, a random element may optionally be represented as a vector of real-valued random variables (all defined on the same underlying probability space [math]\displaystyle{ \Omega }[/math], which allows the different random variables to covary). For example:

  • A random word may be represented as a random integer that serves as an index into the vocabulary of possible words. Alternatively, it can be represented as a random indicator vector, whose length equals the size of the vocabulary, where the only values of positive probability are [math]\displaystyle{ (1 \ 0 \ 0 \ 0 \ \cdots) }[/math], [math]\displaystyle{ (0 \ 1 \ 0 \ 0 \ \cdots) }[/math], [math]\displaystyle{ (0 \ 0 \ 1 \ 0 \ \cdots) }[/math] and the position of the 1 indicates the word.
  • A random sentence of given length [math]\displaystyle{ N }[/math] may be represented as a vector of [math]\displaystyle{ N }[/math] random words.

If a random variable X\colon \Omega \to \mathbb{R} defined on the probability space (\Omega, \mathcal{F}, \operatorname{P}) is given, we can ask questions like "How likely is it that the value of X is equal to 2?". This is the same as the probability of the event \{ \omega : X(\omega) = 2 \}\,\! which is often written as P(X = 2)\,\! or p_X(2) for short.

如果给定了一个概率空间上定义的随机变量X: Omega 到数学{R}的值(Omega,mathcal {F} ,operatorname {P}) ,可以提出这样的问题: “X的值有多大可能等于2? ”.这与事件{ Omega:X(Omega)= 2}的概率相同,它通常写作P(X = 2) ,!或简称 P _ X (2)。

  • A random graph on [math]\displaystyle{ N }[/math] given vertices may be represented as a [math]\displaystyle{ N \times N }[/math] matrix of random variables, whose values specify the adjacency matrix of the random graph.
  • A random function [math]\displaystyle{ F }[/math] may be represented as a collection of random variables [math]\displaystyle{ F(x) }[/math], giving the function's values at the various points [math]\displaystyle{ x }[/math] in the function's domain. The [math]\displaystyle{ F(x) }[/math] are ordinary real-valued random variables provided that the function is real-valued. For example, a stochastic process is a random function of time, a random vector is a random function of some index set such as [math]\displaystyle{ 1,2,\ldots, n }[/math], and random field is a random function on any set (typically time, space, or a discrete set).

Recording all these probabilities of output ranges of a real-valued random variable X yields the probability distribution of X. The probability distribution "forgets" about the particular probability space used to define X and only records the probabilities of various values of X. Such a probability distribution can always be captured by its cumulative distribution function

记录所有这些实值随机变量X的输出范围的概率,就得到了X的概率分布。这个概率分布 "忘记了 "用来定义X的特定概率空间,只记录X的各种值的概率。


Distribution functions

分布函数

F_X(x) = \operatorname{P}(X \le x)

F _ X (x) = 操作者名{ P }(X le x)


If a random variable [math]\displaystyle{ X\colon \Omega \to \mathbb{R} }[/math] defined on the probability space [math]\displaystyle{ (\Omega, \mathcal{F}, \operatorname{P}) }[/math] is given, we can ask questions like "How likely is it that the value of [math]\displaystyle{ X }[/math] is equal to 2?". This is the same as the probability of the event [math]\displaystyle{ \{ \omega : X(\omega) = 2 \}\,\! }[/math] which is often written as [math]\displaystyle{ P(X = 2)\,\! }[/math] or [math]\displaystyle{ p_X(2) }[/math] for short.

and sometimes also using a probability density function, p_X. In measure-theoretic terms, we use the random variable X to "push-forward" the measure P on \Omega to a measure p_X on \mathbb{R}.

有时也使用概率密度函数p_X。用测度理论的术语来说,我们用随机变量X来 "推进 "在\Omega上的度量P到在\mathbb{R}上的度量p_X。


The underlying probability space \Omega is a technical device used to guarantee the existence of random variables, sometimes to construct them, and to define notions such as correlation and dependence or independence based on a joint distribution of two or more random variables on the same probability space. In practice, one often disposes of the space \Omega altogether and just puts a measure on \mathbb{R} that assigns measure 1 to the whole real line, i.e., one works with probability distributions instead of random variables. See the article on quantile functions for fuller development.

基本的概率空间/Omega是一种技术手段,用来保证随机变量的存在,有时用来构造随机变量,并根据两个或多个随机变量在同一概率空间上的联合分布来定义诸如相关性和依赖性或独立性等概念。在实践中,人们常常完全抛弃空间\Omega,而只是在\mathbb{R}上放一个度量,将度量1赋给整个实线,也就是说,人们用概率分布代替随机变量。请参阅关于分位数函数的更全面开发的文章。

Recording all these probabilities of output ranges of a real-valued random variable [math]\displaystyle{ X }[/math] yields the probability distribution of [math]\displaystyle{ X }[/math]. The probability distribution "forgets" about the particular probability space used to define [math]\displaystyle{ X }[/math] and only records the probabilities of various values of [math]\displaystyle{ X }[/math]. Such a probability distribution can always be captured by its cumulative distribution function


[math]\displaystyle{ F_X(x) = \operatorname{P}(X \le x) }[/math]


and sometimes also using a probability density function, [math]\displaystyle{ p_X }[/math]. In measure-theoretic terms, we use the random variable [math]\displaystyle{ X }[/math] to "push-forward" the measure [math]\displaystyle{ P }[/math] on [math]\displaystyle{ \Omega }[/math] to a measure [math]\displaystyle{ p_X }[/math] on [math]\displaystyle{ \mathbb{R} }[/math].

The underlying probability space [math]\displaystyle{ \Omega }[/math] is a technical device used to guarantee the existence of random variables, sometimes to construct them, and to define notions such as correlation and dependence or independence based on a joint distribution of two or more random variables on the same probability space. In practice, one often disposes of the space [math]\displaystyle{ \Omega }[/math] altogether and just puts a measure on [math]\displaystyle{ \mathbb{R} }[/math] that assigns measure 1 to the whole real line, i.e., one works with probability distributions instead of random variables. See the article on quantile functions for fuller development.

In an experiment a person may be chosen at random, and one random variable may be the person's height. Mathematically, the random variable is interpreted as a function which maps the person to the person's height. Associated with the random variable is a probability distribution that allows the computation of the probability that the height is in any subset of possible values, such as the probability that the height is between 180 and 190 cm, or the probability that the height is either less than 150 or more than 200 cm.

在一个实验中,一个人可能是随机选择的,随机变量可能是这个人的身高。数学上,随机变量被解释为一个函数,它把人和人的身高联系起来。与随机变量相关联的是一个概率分布,它允许计算高度处于任何可能值子集中的概率,例如高度在180到190厘米之间的概率,或者高度小于150或大于200厘米的概率。


Examples

事例

Another random variable may be the person's number of children; this is a discrete random variable with non-negative integer values. It allows the computation of probabilities for individual integer values – the probability mass function (PMF) – or for sets of values, including infinite sets. For example, the event of interest may be "an even number of children". For both finite and infinite event sets, their probabilities can be found by adding up the PMFs of the elements; that is, the probability of an even number of children is the infinite sum \operatorname{PMF}(0) + \operatorname{PMF}(2) + \operatorname{PMF}(4) + \cdots.

另一个随机变量可能是一个人的子女数量;这是一个非负整数值的离散随机变量。 它允许计算单个整数值的概率--概率质量函数(PMF)--或值的集合,包括无限集合的概率。 例如,事件可能为 "有偶数个子女"。 对于有限和无限事件集,它们的概率可以通过元素的PMF相加来计算;也就是说,有偶数个子女的概率是{ PMF }(0) + { PMF }(2) + { PMF }(4) + 点的无限和。


Discrete random variable

离散型随机变量

In examples such as these, the sample space is often suppressed, since it is mathematically hard to describe, and the possible values of the random variables are then treated as a sample space. But when two random variables are measured on the same sample space of outcomes, such as the height and number of children being computed on the same random persons, it is easier to track their relationship if it is acknowledged that both height and number of children come from the same random person, for example so that questions of whether such random variables are correlated or not can be posed.

在这样的例子中,由于样本空间在数学上很难描述,所以往往会对样本空间进行抑制,然后将随机变量的可能值作为一个样本空间来处理。但是当两个随机变量在相同的结果样本空间中被测量时,例如在相同的随机人群中计算出的身高和子女数量,如果承认身高和儿童的数量都来自同一个随机人群,则更容易追踪他们之间的关系,例如这样就可以提出这些随机变量是否相关的问题。


In an experiment a person may be chosen at random, and one random variable may be the person's height. Mathematically, the random variable is interpreted as a function which maps the person to the person's height. Associated with the random variable is a probability distribution that allows the computation of the probability that the height is in any subset of possible values, such as the probability that the height is between 180 and 190 cm, or the probability that the height is either less than 150 or more than 200 cm.

If [math]\displaystyle{ \{a_n\}, \{b_n\} }[/math] are countable sets of real numbers, [math]\displaystyle{ b_n \gt 0 }[/math] and \sum_n b_n=1, then F=\sum_n b_n \delta_{a_n} is a discrete distribution function. Here \delta_t(x) = 0 for x < t, \delta_t(x) = 1 for x \ge t. Taking for instance an enumeration of all rational numbers as \{a_n\}, one gets a discrete distribution function that is not a step function or piecewise constant. There are no "gaps", which would correspond to numbers which have a finite probability of occurring. Instead, continuous random variables almost never take an exact prescribed value c (formally, [math]\displaystyle{ \forall c \in \mathbb{R}:\; \Pr(X = c) = 0 }[/math]) but there is a positive probability that its value will lie in particular intervals which can be arbitrarily small. Continuous random variables usually admit probability density functions (PDF), which characterize their CDF and probability measures;

如果[math]\displaystyle{ {a_n/}, \{b_n/} }[/math]是可计数的实数集,[math]\displaystyle{ b_n \gt 0 }[/math]并且\sum_n b_n=1,那么F=\sum_n b_n \delta_{a_n}是一个离散分布函数。这里,对于x < t,delta_t(x) = 0,对于x \ge t,delta_t(x) = 1。例如,将所有有理数的枚举取名为 \{a_n\},就可以得到一个离散分布函数,它不是阶跃函数,也不是片断常数。没有 "空隙",而空隙对应的是数字出现的概率是有限的。 相反,连续随机变量几乎从不取一个精确的规定值c(形式上,[math]\displaystyle{ /forall c \in \mathbb{R}:\; \Pr(X = c) = 0 }[/math]),但存在正的概率,它的值将位于特定的区间,这个区间可以任意小。连续型随机变量通常包含概率密度函数(PDF) ,它表征了它们的 CDF 和概率测度;


such distributions are also called absolutely continuous; but some continuous distributions are singular, or mixes of an absolutely continuous part and a singular part.

这样的分布也称为绝对连续分布,但有些连续分布是奇异型的,或者是绝对连续部分和奇异部分的混合。

Another random variable may be the person's number of children; this is a discrete random variable with non-negative integer values. It allows the computation of probabilities for individual integer values – the probability mass function (PMF) – or for sets of values, including infinite sets. For example, the event of interest may be "an even number of children". For both finite and infinite event sets, their probabilities can be found by adding up the PMFs of the elements; that is, the probability of an even number of children is the infinite sum [math]\displaystyle{ \operatorname{PMF}(0) + \operatorname{PMF}(2) + \operatorname{PMF}(4) + \cdots }[/math].


An example of a continuous random variable would be one based on a spinner that can choose a horizontal direction. Then the values taken by the random variable are directions. We could represent these directions by North, West, East, South, Southeast, etc. However, it is commonly more convenient to map the sample space to a random variable which takes values which are real numbers. This can be done, for example, by mapping a direction to a bearing in degrees clockwise from North. The random variable then takes values which are real numbers from the interval [0, 360), with all parts of the range being "equally likely". In this case, X = the angle spun. Any real number has probability zero of being selected, but a positive probability can be assigned to any range of values. For example, the probability of choosing a number in [0, 180] is . Instead of speaking of a probability mass function, we say that the probability density of X is 1/360. The probability of a subset of [0, 360) can be calculated by multiplying the measure of the set by 1/360. In general, the probability of a set for a given continuous random variable can be calculated by integrating the density over the given set.

一个连续随机变量的例子是基于一个可以选择水平方向的旋转器。那么随机变量所取的值就是方向。我们可以用北、西、东、南、东南等表示这些方向。然而,通常更方便的做法是将样本空间映射到一个随机变量上,该变量的取值是实数。例如,可以通过将一个方向映射到一个从北到南的顺时针方向的方位上。然后,随机变量从区间[0, 360)中取实数值,范围内的所有部分都有 "等可能性"。在这种情况下,X=旋转的角度。任何实数被选中的概率都为零,但可以给任何范围的值赋予一个正的概率。例如,在[0,180]中选择一个数字的概率。X的概率密度而不是概率质量函数是1/360。[0,360]子集的概率可以用集合的度量乘以1/360来计算。一般来说,一个给定的连续随机变量的集合的概率可以通过积分给定集合上的密度来计算。

In examples such as these, the sample space is often suppressed, since it is mathematically hard to describe, and the possible values of the random variables are then treated as a sample space. But when two random variables are measured on the same sample space of outcomes, such as the height and number of children being computed on the same random persons, it is easier to track their relationship if it is acknowledged that both height and number of children come from the same random person, for example so that questions of whether such random variables are correlated or not can be posed.


More formally, given any interval [math]\displaystyle{ I = [a, b] = \{x \in \mathbb{R} : a \le x \le b \} }[/math], a random variable X_I \sim \operatorname{U}(I) = \operatorname{U}[a, b] is called a "continuous uniform random variable" (CURV) if the probability that it takes a value in a subinterval depends only on the length of the subinterval. This implies that the probability of X_I falling in any subinterval [c, d] \sube [a, b] is proportional to the length of the subinterval, that is, if , one has

更正式地说,给定任意区间 < math display = " " inline" > i = [ a,b ] = { x in mathbb { r } : a le x le b } </math > ,如果一个随机变量 x _ i sim { u }(i) = { u }[ a,b ]在一个子区间内取值的概率只取决于子区间的长度,则称为“连续型均匀随机变量”(continuous uniform random variable,CURV)。这意味着 x _ i 在任意子区间[ c,d ] sube [ a,b ]下降的概率与子区间的长度成正比,也就是说,如果有一个子区间[ c,d ] sube [ a,b ]

If [math]\displaystyle{ \{a_n\}, \{b_n\} }[/math] are countable sets of real numbers, [math]\displaystyle{ b_n \gt 0 }[/math] and [math]\displaystyle{ \sum_n b_n=1 }[/math], then [math]\displaystyle{ F=\sum_n b_n \delta_{a_n} }[/math] is a discrete distribution function. Here [math]\displaystyle{ \delta_t(x) = 0 }[/math] for [math]\displaystyle{ x \lt t }[/math], [math]\displaystyle{ \delta_t(x) = 1 }[/math] for [math]\displaystyle{ x \ge t }[/math]. Taking for instance an enumeration of all rational numbers as [math]\displaystyle{ \{a_n\} }[/math], one gets a discrete distribution function that is not a step function or piecewise constant.[5]


[math]\displaystyle{ “数学显示屏” ====Coin toss==== 抛硬币 \Pr\left( X_I \in [c,d]\right) Pr 左(x _ i 在[ c,d ]右) = \frac{d - c}{b - a}\Pr\left( X_I \in I\right)= \frac{d - c}{b - a} }[/math]

= frac { d-c }{ b-a } Pr left (x _ i in i right) = frac { d-c }{ b-a } </math >

The possible outcomes for one coin toss can be described by the sample space [math]\displaystyle{ \Omega = \{\text{heads}, \text{tails}\} }[/math]. We can introduce a real-valued random variable [math]\displaystyle{ Y }[/math] that models a $1 payoff for a successful bet on heads as follows:

[math]\displaystyle{ where the last equality results from the unitarity axiom of probability. The probability density function of a CURV X \sim \operatorname {U}[a, b] is given by the indicator function of its interval of support normalized by the interval's length: \lt math display="block"\gt f_X(x) = \begin{cases} 最后一个等式来自于统一性的概率公理。一个 CURV x sim 操作器名称{ u }[ a,b ]的概率密度函数是由它的支持区间的指示函数被区间的长度归一化得到的: \lt math display = " block" \gt f _ x (x) = begin { cases } Y(\omega) = \displaystyle{1 \over b-a}, & a \le x \le b \\ 显示风格{1 over b-a } ,& a le x le b \begin{cases} 0, & \text{otherwise}. 0,& text { otherwise }. 1, & \text{if } \omega = \text{heads}, \\[6pt] \end{cases} }[/math]Of particular interest is the uniform distribution on the unit interval [0, 1]. Samples of any desired probability distribution \operatorname{D} can be generated by calculating the quantile function of \operatorname{D} on a randomly-generated number distributed uniformly on the unit interval. This exploits properties of cumulative distribution functions, which are a unifying framework for all random variables.

特别值得关注的是单位区间[0,1]上的均匀分布。 任何所需概率分布的样本{D}都可以通过计算均匀分布在单位区间上的随机生成数的量化函数来生成。 这就利用了累积分布函数的特性,它是所有随机变量的统一框架。

0, & \text{if } \omega = \text{tails}.

\end{cases}

</math>


A mixed random variable is a random variable whose cumulative distribution function is neither piecewise-constant (a discrete random variable) nor everywhere-continuous. This definition enables us to measure any subset B\in \mathcal{E} in the target space by looking at its preimage, which by assumption is measurable.

一个混合的随机变量是一个随机变量,它的累积分布函数既不是分段常数(一个离散的随机变量) ,也不是处处连续的。这个定义使我们能够通过观察目标空间中数学{E}中的任意子集B的前象来测量,根据假设,前象是可测量的。

If the coin is a fair coin, Y has a probability mass function [math]\displaystyle{ f_Y }[/math] given by:


In more intuitive terms, a member of \Omega is a possible outcome, a member of \mathcal{F} is a measurable subset of possible outcomes, the function P gives the probability of each such measurable subset, E represents the set of values that the random variable can take (such as the set of real numbers), and a member of \mathcal{E} is a "well-behaved" (measurable) subset of E (those for which the probability may be determined). The random variable is then a function from any outcome to a quantity, such that the outcomes leading to any useful subset of quantities for the random variable have a well-defined probability.

更直观地讲,\Omega的是由可能的结果组成的,\mathcal{F}是由可能结果的可测量子集组成的,函数P给出了每个这样的可测量子集的概率,E代表随机变量可以取的值集(如实数集),而\mathcal{E}的一个成员是E的一个 "行为良好"(可测量的)子集(那些概率可以确定的)。那么,随机变量是一个从任何结果到一个量的函数,这样,导致随机变量的任何有用的量子集的结果都有一个确定的概率。

[math]\displaystyle{ f_Y(y) = When E is a topological space, then the most common choice for the σ-algebra \mathcal{E} is the Borel σ-algebra \mathcal{B}(E), which is the σ-algebra generated by the collection of all open sets in E. In such case the (E, \mathcal{E})-valued random variable is called an E-valued random variable. Moreover, when the space E is the real line \mathbb{R}, then such a real-valued random variable is called simply a random variable. 当E是一个拓扑空间时,那么最常见的σ代数/mathcal{E}的选择是博尔σ代数/mathcal{B}(E),它是由E中所有开放集的集合生成的σ代数,在这种情况下,(E,\mathcal{E})值的随机变量称为E值的随机变量。此外,当空间E为实线\mathbb{R}时,则这样的实值随机变量称为简单随机变量。 \begin{cases} \tfrac 12,& \text{if }y=1,\\[6pt] \tfrac 12,& \text{if }y=0, \end{cases} In this case the observation space is the set of real numbers. Recall, (\Omega, \mathcal{F}, P) is the probability space. For a real observation space, the function X\colon \Omega \rightarrow \mathbb{R} is a real-valued random variable if 在这种情况下,观测空间是实数的集合。回想一下,(Omega,mathcal {F} ,P)是概率空间。对于一个真实的观察空间,函数 Xcolon Omega right tarrow mathbb {R}是一个真值随机变量,如果 }[/math]

\{ \omega : X(\omega) \le r \} \in \mathcal{F} \qquad \forall r \in \mathbb{R}.

{ omega: X (omega) le r }{F} qquad for all r {F}。


Dice roll

掷骰子

This definition is a special case of the above because the set \{(-\infty, r]: r \in \R\} generates the Borel σ-algebra on the set of real numbers, and it suffices to check measurability on any generating set. Here we can prove measurability on this generating set by using the fact that \{ \omega : X(\omega) \le r \} = X^{-1}((-\infty, r]).

这个定义是上述定义的一个特例,因为集合{(- infty,r ] : r 在 R }上生成实数集上的 Borel σ- 代数,它足以检验任何生成集上的可测性。这里我们可以证明在这个生成集上的可测性,方法是使用{ omega: X (omega) le r } = X {-1}((- infty,r ])。


文件:Dice Distribution (bar).svg
If the sample space is the set of possible numbers rolled on two dice, and the random variable of interest is the sum S of the numbers on the two dice, then S is a discrete random variable whose distribution is described by the probability mass function plotted as the height of picture columns here.
文件:Dice Distribution (bar).svg
如果样本空间是两个骰子上掷出的可能数字的集合,而感兴趣的随机变量是两个骰子上的数字之和“S”,那么“S”是一个离散的随机变量,其分布由概率质量函数来描述图片列的高度


A random variable can also be used to describe the process of rolling dice and the possible outcomes. The most obvious representation for the two-dice case is to take the set of pairs of numbers n1 and n2 from {1, 2, 3, 4, 5, 6} (representing the numbers on the two dice) as the sample space. The total number rolled (the sum of the numbers in each pair) is then a random variable X given by the function that maps the pair to the sum:

The probability distribution of a random variable is often characterised by a small number of parameters, which also have a practical interpretation. For example, it is often enough to know what its "average value" is. This is captured by the mathematical concept of expected value of a random variable, denoted \operatorname{E}[X], and also called the first moment. In general, \operatorname{E}[f(X)] is not equal to f(\operatorname{E}[X]). Once the "average value" is known, one could then ask how far from this average value the values of X typically are, a question that is answered by the variance and standard deviation of a random variable. \operatorname{E}[X] can be viewed intuitively as an average obtained from an infinite population, the members of which are particular evaluations of X.

随机变量的概率分布通常由少量的参数表示,这些参数也有实际的解释。例如,知道它的“平均值”通常就足够了。这是由一个随机变量的期望值的数学概念捕捉到的,它表示操作者名{E}[X] ,也称为第一阶矩。通常,操作数名{E}[f(X)]不等于F(操作数名{E}[X]) 。一旦知道了“平均值” ,人们就可以问X的值离这个平均值有多远,这个问题可以用随机变量的方差和标准差来回答。操作数名{E}[X]可以直观地看作是从无限总体中获得的平均值,其成员是 x 的特定计算值。


[math]\displaystyle{ X((n_1, n_2)) = n_1 + n_2 }[/math]

Mathematically, this is known as the (generalised) problem of moments: for a given class of random variables X, find a collection \{f_i\} of functions such that the expectation values \operatorname{E}[f_i(X)] fully characterise the distribution of the random variable X.

在数学上,这被称为(广义的)矩问题: 对于给定的一类随机变量X,找到一个函数集{ f _ i } ,使得数学期望值操作者名{E}[ f _ i (E)]完全刻画了随机变量X的分布。


and (if the dice are fair) has a probability mass function ƒX given by:

Moments can only be defined for real-valued functions of random variables (or complex-valued, etc.). If the random variable is itself real-valued, then moments of the variable itself can be taken, which are equivalent to moments of the identity function f(X)=X of the random variable. However, even for non-real-valued random variables, moments can be taken of real-valued functions of those variables. For example, for a categorical random variable X that can take on the nominal values "red", "blue" or "green", the real-valued function [X = \text{green}] can be constructed; this uses the Iverson bracket, and has the value 1 if X has the value "green", 0 otherwise. Then, the expected value and other moments of this function can be determined.

矩只能定义为实值随机变量函数(或复值函数等)。如果随机变量本身是实值的,则可取变量本身的矩,等价于随机变量的单位函数 f (X) =X的矩。然而,即使对于非实值随机变量,矩也可以是这些变量的实值函数。例如,对于一个名义值为“ red”、“ blue”或“ green”的绝对随机变量 x,可以构造实值函数值[ X = text { green }] ; 这使用 Iverson 括号,如果X的值为“ green” ,则值为1,否则为0。然后,可以确定该函数的期望值和其他矩。


[math]\displaystyle{ f_X(S) = \frac{\min(S-1, 13-S)}{36}, \text{ for } S \in \{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12\} }[/math]


Continuous random variable

连续随机变量

A new random variable Y can be defined by applying a real Borel measurable function g\colon \mathbb{R} \rightarrow \mathbb{R} to the outcomes of a real-valued random variable X. That is, Y=g(X). The cumulative distribution function of Y is then

一个新的随机变量Y可以通过对实值随机变量X的结果应用一个实博尔可测函数{R}来定义。也就是Y=g(X)。那么Y的累积分布函数就是


Formally, a continuous random variable is a random variable whose cumulative distribution function is continuous everywhere.[8] There are no "gaps", which would correspond to numbers which have a finite probability of occurring. Instead, continuous random variables almost never take an exact prescribed value c (formally, [math]\displaystyle{ \forall c \in \mathbb{R}:\; \Pr(X = c) = 0 }[/math]) but there is a positive probability that its value will lie in particular intervals which can be arbitrarily small. Continuous random variables usually admit probability density functions (PDF), which characterize their CDF and probability measures;

F_Y(y) = \operatorname{P}(g(X) \le y).

F _ Y(y) ={P}(g (X) le y)。

such distributions are also called absolutely continuous; but some continuous distributions are singular, or mixes of an absolutely continuous part and a singular part.


If function g is invertible (i.e., h = g^{-1} exists, where h is g's inverse function) and is either increasing or decreasing, then the previous relation can be extended to obtain

如果函数 g 是可逆的(例如,h = g ^ {-1}存在,其中 h 是 g 的反函数) ,并且是递增或递减的,那么前面的关系可以被扩展以获得

An example of a continuous random variable would be one based on a spinner that can choose a horizontal direction. Then the values taken by the random variable are directions. We could represent these directions by North, West, East, South, Southeast, etc. However, it is commonly more convenient to map the sample space to a random variable which takes values which are real numbers. This can be done, for example, by mapping a direction to a bearing in degrees clockwise from North. The random variable then takes values which are real numbers from the interval [0, 360), with all parts of the range being "equally likely". In this case, X = the angle spun. Any real number has probability zero of being selected, but a positive probability can be assigned to any range of values. For example, the probability of choosing a number in [0, 180] is 模板:Frac. Instead of speaking of a probability mass function, we say that the probability density of X is 1/360. The probability of a subset of [0, 360) can be calculated by multiplying the measure of the set by 1/360. In general, the probability of a set for a given continuous random variable can be calculated by integrating the density over the given set.


[math]\displaystyle{ F_Y(y) = \operatorname{P}(g(X) \le y) = \lt math \gt F _ y (y) = p }(g (x) le y) = More formally, given any [[Interval (mathematics)|interval]] \lt math display="inline"\gt I = [a, b] = \{x \in \mathbb{R} : a \le x \le b \} }[/math], a random variable [math]\displaystyle{ X_I \sim \operatorname{U}(I) = \operatorname{U}[a, b] }[/math] is called a "continuous uniform random variable" (CURV) if the probability that it takes a value in a subinterval depends only on the length of the subinterval. This implies that the probability of [math]\displaystyle{ X_I }[/math] falling in any subinterval [math]\displaystyle{ [c, d] \sube [a, b] }[/math] is proportional to the length of the subinterval, that is, if acdb, one has

\begin{cases}

{ cases }


\operatorname{P}(X \le h(y)) = F_X(h(y)),

{ p }(x le h (y)) = f _ x (h (y)) ,

[math]\displaystyle{ & \text{if } h = g^{-1} \text{ increasing} ,\\ 文本{ if } h = g ^ {-1} text { incremental } , \Pr\left( X_I \in [c,d]\right) \\ \\ = \frac{d - c}{b - a}\Pr\left( X_I \in I\right)= \frac{d - c}{b - a} }[/math]

\operatorname{P}(X \ge h(y)) = 1 - F_X(h(y)),

{ p }(xgeh (y)) = 1-f _ x (h (y)) ,


& \text{if } h = g^{-1} \text{ decreasing} .

& text { if } h = g ^ {-1} text { reducing }.

where the last equality results from the unitarity axiom of probability. The probability density function of a CURV [math]\displaystyle{ X \sim \operatorname {U}[a, b] }[/math] is given by the indicator function of its interval of support normalized by the interval's length: [math]\displaystyle{ f_X(x) = \begin{cases} \end{cases} }[/math]

{ cases } </math >

\displaystyle{1 \over b-a}, & a \le x \le b \\ 
 0, & \text{otherwise}. 

With the same hypotheses of invertibility of g, assuming also differentiability, the relation between the probability density functions can be found by differentiating both sides of the above expression with respect to y, in order to obtain

在同样假设g具有可逆性的情况下,假设也具有可微分性,将上述表达式的两边相对于y进行微分,就可以找到概率密度函数之间的关系,从而得到

\end{cases}</math>Of particular interest is the uniform distribution on the unit interval [math]\displaystyle{ [0, 1] }[/math]. Samples of any desired probability distribution [math]\displaystyle{ \operatorname{D} }[/math] can be generated by calculating the quantile function of [math]\displaystyle{ \operatorname{D} }[/math] on a randomly-generated number distributed uniformly on the unit interval. This exploits properties of cumulative distribution functions, which are a unifying framework for all random variables.


f_Y(y) = f_X\bigl(h(y)\bigr) \left| \frac{d h(y)}{d y} \right|.

F _ y (y) = f _ x bigl (h (y) bigr)左 | frac { d h (y)}{ d y }右 | 。

Mixed type

混合类型


If there is no invertibility of g but each y admits at most a countable number of roots (i.e., a finite, or countably infinite, number of x_i such that y = g(x_i)) then the previous relation between the probability density functions can be generalized with

如果 g 没有可逆性,但每个 y 最多只能容纳可数个根(即 x _ i 的有限个数,或可数无限个数,使得 y = g (x _ i)) ,则概率密度函数之间的前一个关系可以推广为

A mixed random variable is a random variable whose cumulative distribution function is neither piecewise-constant (a discrete random variable) nor everywhere-continuous.[8] It can be realized as the sum of a discrete random variable and a continuous random variable; in which case the 模板:Abbr will be the weighted average of the CDFs of the component variables.[8]


f_Y(y) = \sum_{i} f_X(g_{i}^{-1}(y)) \left| \frac{d g_{i}^{-1}(y)}{d y} \right|

f _ Y (y) = sum _ { i } f _ x (g _ { i } ^ {-1}(y))) | frac { g _ { i } ^ {-1}(y)}{ d y } |

An example of a random variable of mixed type would be based on an experiment where a coin is flipped and the spinner is spun only if the result of the coin toss is heads. If the result is tails, X = −1; otherwise X = the value of the spinner as in the preceding example. There is a probability of 模板:Frac that this random variable will have the value −1. Other ranges of values would have half the probabilities of the last example.


where x_i = g_i^{-1}(y), according to the inverse function theorem. The formulas for densities do not demand g to be increasing.

其中 x _ i = g _ i ^ {-1}(y) ,根据反函数定理。密度公式不要求 g 增加。

Most generally, every probability distribution on the real line is a mixture of discrete part, singular part, and an absolutely continuous part; see 模板:Section link. The discrete part is concentrated on a countable set, but this set may be dense (like the set of all rational numbers).


In the measure-theoretic, axiomatic approach to probability, if a random variable X on \Omega and a Borel measurable function g\colon \mathbb{R} \rightarrow \mathbb{R}, then Y = g(X) is also a random variable on \Omega, since the composition of measurable functions is also measurable. (However, this is not necessarily true if g is Lebesgue measurable.) The same procedure that allowed one to go from a probability space (\Omega, P) to (\mathbb{R}, dF_{X}) can be used to obtain the distribution of Y.

在概率的度量理论和公理化方法中,如果一个随机变量x在 Omega 上,一个博尔可测函数{ R } ,那么 Y = g (X)在 Omega 上也是一个随机变量,因为可测函数的组合也是可测的。(然而,如果 g 是 Lebesgue 可测的,这就不一定成立了。)同样的过程,允许一个人从一个概率空间(Omega,P)到(mathbb { R } ,dF { X }) ,可以用来得到 Y 的分布。

Measure-theoretic definition

测度理论定义


The most formal, axiomatic definition of a random variable involves measure theory. Continuous random variables are defined in terms of sets of numbers, along with functions that map such sets to probabilities. Because of various difficulties (e.g. the Banach–Tarski paradox) that arise if such sets are insufficiently constrained, it is necessary to introduce what is termed a sigma-algebra to constrain the possible sets over which probabilities can be defined. Normally, a particular such sigma-algebra is used, the Borel σ-algebra, which allows for probabilities to be defined over any sets that can be derived either directly from continuous intervals of numbers or by a finite or countably infinite number of unions and/or intersections of such intervals.[2]


Let X be a real-valued, continuous random variable and let Y = X^2.

设 X 是实值连续随机变量,Y = X ^ 2。

The measure-theoretic definition is as follows.


F_Y(y) = \operatorname{P}(X^2 \le y).

F _ Y (y) ={ p }(X ^ 2 le y)。

Let [math]\displaystyle{ (\Omega, \mathcal{F}, P) }[/math] be a probability space and [math]\displaystyle{ (E, \mathcal{E}) }[/math] a measurable space. Then an [math]\displaystyle{ (E, \mathcal{E}) }[/math]-valued random variable is a measurable function [math]\displaystyle{ X\colon \Omega \to E }[/math], which means that, for every subset [math]\displaystyle{ B\in\mathcal{E} }[/math], its preimage [math]\displaystyle{ X^{-1}(B)\in \mathcal{F} }[/math] where [math]\displaystyle{ X^{-1}(B) = \{\omega : X(\omega)\in B\} }[/math].[9] This definition enables us to measure any subset [math]\displaystyle{ B\in \mathcal{E} }[/math] in the target space by looking at its preimage, which by assumption is measurable.


If y < 0, then P(X^2 \leq y) = 0, so

如果 y < 0,那么 P (X ^ 2 leq y) = 0,所以

In more intuitive terms, a member of [math]\displaystyle{ \Omega }[/math] is a possible outcome, a member of [math]\displaystyle{ \mathcal{F} }[/math] is a measurable subset of possible outcomes, the function [math]\displaystyle{ P }[/math] gives the probability of each such measurable subset, [math]\displaystyle{ E }[/math] represents the set of values that the random variable can take (such as the set of real numbers), and a member of [math]\displaystyle{ \mathcal{E} }[/math] is a "well-behaved" (measurable) subset of [math]\displaystyle{ E }[/math] (those for which the probability may be determined). The random variable is then a function from any outcome to a quantity, such that the outcomes leading to any useful subset of quantities for the random variable have a well-defined probability.


F_Y(y) = 0\qquad\hbox{if}\quad y < 0.

0 qquad hbox { if } quad y < 0.

When [math]\displaystyle{ E }[/math] is a topological space, then the most common choice for the σ-algebra [math]\displaystyle{ \mathcal{E} }[/math] is the Borel σ-algebra [math]\displaystyle{ \mathcal{B}(E) }[/math], which is the σ-algebra generated by the collection of all open sets in [math]\displaystyle{ E }[/math]. In such case the [math]\displaystyle{ (E, \mathcal{E}) }[/math]-valued random variable is called an [math]\displaystyle{ E }[/math]-valued random variable. Moreover, when the space [math]\displaystyle{ E }[/math] is the real line [math]\displaystyle{ \mathbb{R} }[/math], then such a real-valued random variable is called simply a random variable.


If y \geq 0, then

如果 y geq 0,那么

Real-valued random variables

实值随机变量


[math]\displaystyle{ \operatorname{P}(X^2 \le y) = \operatorname{P}(|X| \le \sqrt{y}) (x ^ 2 le y) = operatorname { p }(| x | le sqrt { y }) In this case the observation space is the set of real numbers. Recall, \lt math\gt (\Omega, \mathcal{F}, P) }[/math] is the probability space. For a real observation space, the function [math]\displaystyle{ X\colon \Omega \rightarrow \mathbb{R} }[/math] is a real-valued random variable if

= \operatorname{P}(-\sqrt{y} \le  X \le \sqrt{y}),</math>

= { p }(- sqrt { y } le x le sqrt { y }) ,</math >

[math]\displaystyle{ \{ \omega : X(\omega) \le r \} \in \mathcal{F} \qquad \forall r \in \mathbb{R}. }[/math]


so

所以

This definition is a special case of the above because the set [math]\displaystyle{ \{(-\infty, r]: r \in \R\} }[/math] generates the Borel σ-algebra on the set of real numbers, and it suffices to check measurability on any generating set. Here we can prove measurability on this generating set by using the fact that [math]\displaystyle{ \{ \omega : X(\omega) \le r \} = X^{-1}((-\infty, r]) }[/math].


F_Y(y) = F_X(\sqrt{y}) - F_X(-\sqrt{y})\qquad\hbox{if}\quad y \ge 0.

F _ y (y) = f _ x (sqrt { y })-f _ x (- sqrt { y }) qquad hbox { if } quad y ge 0.

Moments


The probability distribution of a random variable is often characterised by a small number of parameters, which also have a practical interpretation. For example, it is often enough to know what its "average value" is. This is captured by the mathematical concept of expected value of a random variable, denoted [math]\displaystyle{ \operatorname{E}[X] }[/math], and also called the first moment. In general, [math]\displaystyle{ \operatorname{E}[f(X)] }[/math] is not equal to [math]\displaystyle{ f(\operatorname{E}[X]) }[/math]. Once the "average value" is known, one could then ask how far from this average value the values of [math]\displaystyle{ X }[/math] typically are, a question that is answered by the variance and standard deviation of a random variable. [math]\displaystyle{ \operatorname{E}[X] }[/math] can be viewed intuitively as an average obtained from an infinite population, the members of which are particular evaluations of [math]\displaystyle{ X }[/math].


Suppose X is a random variable with a cumulative distribution

假设X是一个具有累积分布的随机变量

Mathematically, this is known as the (generalised) problem of moments: for a given class of random variables [math]\displaystyle{ X }[/math], find a collection [math]\displaystyle{ \{f_i\} }[/math] of functions such that the expectation values [math]\displaystyle{ \operatorname{E}[f_i(X)] }[/math] fully characterise the distribution of the random variable [math]\displaystyle{ X }[/math].


F_{X}(x) = P(X \leq x) = \frac{1}{(1 + e^{-x})^{\theta}}

F _ { x }(x) = p (x leq x) = frac {1}{(1 + e ^ {-x }) ^ { theta }}

Moments can only be defined for real-valued functions of random variables (or complex-valued, etc.). If the random variable is itself real-valued, then moments of the variable itself can be taken, which are equivalent to moments of the identity function [math]\displaystyle{ f(X)=X }[/math] of the random variable. However, even for non-real-valued random variables, moments can be taken of real-valued functions of those variables. For example, for a categorical random variable X that can take on the nominal values "red", "blue" or "green", the real-valued function [math]\displaystyle{ [X = \text{green}] }[/math] can be constructed; this uses the Iverson bracket, and has the value 1 if [math]\displaystyle{ X }[/math] has the value "green", 0 otherwise. Then, the expected value and other moments of this function can be determined.


where \theta > 0 is a fixed parameter. Consider the random variable Y = \mathrm{log}(1 + e^{-X}). Then,

其中 θ > 0是一个固定的参数。考虑随机变量 Y = mathrm { log }(1 + e ^ {-X })。然后,

Functions of random variables

随机变量函数


F_{Y}(y) = P(Y \leq y) = P(\mathrm{log}(1 + e^{-X}) \leq y) = P(X \geq -\mathrm{log}(e^{y} - 1)).\,

f _ { y }(y) = p (y leq y) = p (mathrm { log }(1 + e ^ {-x }) leq y) = p (x geq-mathrm { log }(e ^ { y }-1)) ,

A new random variable Y can be defined by applying a real Borel measurable function [math]\displaystyle{ g\colon \mathbb{R} \rightarrow \mathbb{R} }[/math] to the outcomes of a real-valued random variable [math]\displaystyle{ X }[/math]. That is, [math]\displaystyle{ Y=g(X) }[/math]. The cumulative distribution function of [math]\displaystyle{ Y }[/math] is then


The last expression can be calculated in terms of the cumulative distribution of X, so

最后一个表达式可以用 X 的累积分布来计算,因此

[math]\displaystyle{ F_Y(y) = \operatorname{P}(g(X) \le y). }[/math]


[math]\displaystyle{ If function \lt math\gt g }[/math] is invertible (i.e., [math]\displaystyle{ h = g^{-1} }[/math] exists, where [math]\displaystyle{ h }[/math] is [math]\displaystyle{ g }[/math]'s inverse function) and is either increasing or decreasing, then the previous relation can be extended to obtain

\begin{align} { align }


F_Y(y) & = 1 - F_X(-\log(e^y - 1)) \\[5pt]

F _ y (y) & = 1-f _ x (- log (e ^ y-1))[5 pt ]

[math]\displaystyle{ F_Y(y) = \operatorname{P}(g(X) \le y) = & = 1 - \frac{1}{(1 + e^{\log(e^y - 1)})^\theta} \\[5pt] 1-frac {1}{(1 + e ^ { log (e ^ y-1)}) ^ theta }[5 pt ] \begin{cases} & = 1 - \frac{1}{(1 + e^y - 1)^\theta} \\[5pt] 1-frac {1}{(1 + e ^ y-1) ^ theta }[5 pt ] \operatorname{P}(X \le h(y)) = F_X(h(y)), & = 1 - e^{-y \theta}. & = 1-e ^ {-y theta }. & \text{if } h = g^{-1} \text{ increasing} ,\\ \end{align} { align } \\ }[/math]


\operatorname{P}(X \ge h(y)) = 1 - F_X(h(y)),

& \text{if } h = g^{-1} \text{ decreasing} .

which is the cumulative distribution function (CDF) of an exponential distribution.

这是一个累积分布函数的指数分布。

\end{cases}</math>


With the same hypotheses of invertibility of [math]\displaystyle{ g }[/math], assuming also differentiability, the relation between the probability density functions can be found by differentiating both sides of the above expression with respect to [math]\displaystyle{ y }[/math], in order to obtain[8]


Suppose X is a random variable with a standard normal distribution, whose density is

假设 X 是一个标准正态分布的随机变量,其密度为

[math]\displaystyle{ f_Y(y) = f_X\bigl(h(y)\bigr) \left| \frac{d h(y)}{d y} \right|. }[/math]


f_X(x) = \frac{1}{\sqrt{2\pi}}e^{-x^2/2}.

F _ x (x) = frac {1}{ sqrt {2 pi } e ^ {-x ^ 2/2}.

If there is no invertibility of [math]\displaystyle{ g }[/math] but each [math]\displaystyle{ y }[/math] admits at most a countable number of roots (i.e., a finite, or countably infinite, number of [math]\displaystyle{ x_i }[/math] such that [math]\displaystyle{ y = g(x_i) }[/math]) then the previous relation between the probability density functions can be generalized with


Consider the random variable Y = X^2. We can find the density using the above formula for a change of variables:

考虑随机变量 Y = X ^ 2。我们可以用上面的变量变化公式来计算密度:

[math]\displaystyle{ f_Y(y) = \sum_{i} f_X(g_{i}^{-1}(y)) \left| \frac{d g_{i}^{-1}(y)}{d y} \right| }[/math]


f_Y(y) = \sum_{i} f_X(g_{i}^{-1}(y)) \left| \frac{d g_{i}^{-1}(y)}{d y} \right|.

F _ y (y) = sum _ { i } f _ x (g _ { i } ^ {-1}(y)))左 | frac { g _ { i } ^ {-1}(y)}{ d y }右 | 。

where [math]\displaystyle{ x_i = g_i^{-1}(y) }[/math], according to the inverse function theorem. The formulas for densities do not demand [math]\displaystyle{ g }[/math] to be increasing.


In this case the change is not monotonic, because every value of Y has two corresponding values of X (one positive and negative). However, because of symmetry, both halves will transform identically, i.e.,

在这种情况下,变化不是单调的,因为Y的每个值都有两个对应的X值(一个正值和一个负值)。然而,由于对称性,两个半部分会完全相同地变换,例如,

In the measure-theoretic, axiomatic approach to probability, if a random variable [math]\displaystyle{ X }[/math] on [math]\displaystyle{ \Omega }[/math] and a Borel measurable function [math]\displaystyle{ g\colon \mathbb{R} \rightarrow \mathbb{R} }[/math], then [math]\displaystyle{ Y = g(X) }[/math] is also a random variable on [math]\displaystyle{ \Omega }[/math], since the composition of measurable functions is also measurable. (However, this is not necessarily true if [math]\displaystyle{ g }[/math] is Lebesgue measurable.[citation needed]) The same procedure that allowed one to go from a probability space [math]\displaystyle{ (\Omega, P) }[/math] to [math]\displaystyle{ (\mathbb{R}, dF_{X}) }[/math] can be used to obtain the distribution of [math]\displaystyle{ Y }[/math].


f_Y(y) = 2f_X(g^{-1}(y)) \left| \frac{d g^{-1}(y)}{d y} \right|.

f _ y (y) = 2f _ x (g ^ {-1}(y))左 | frac { d ^ {-1}(y)}{ d y }右 | 。

Example 1

例1


The inverse transformation is

逆变换是

Let [math]\displaystyle{ X }[/math] be a real-valued, continuous random variable and let [math]\displaystyle{ Y = X^2 }[/math].

x = g^{-1}(y) = \sqrt{y}

X = g ^ {-1}(y) = sqrt { y }


and its derivative is

它的衍变是

[math]\displaystyle{ F_Y(y) = \operatorname{P}(X^2 \le y). }[/math]

\frac{d g^{-1}(y)}{d y} = \frac{1}{2\sqrt{y}} .

Frac { d ^ {-1}(y)}{ d y } = frac {1}{2 sqrt { y }}.


If [math]\displaystyle{ y \lt 0 }[/math], then [math]\displaystyle{ P(X^2 \leq y) = 0 }[/math], so

Then,

然后,


[math]\displaystyle{ F_Y(y) = 0\qquad\hbox{if}\quad y \lt 0. }[/math]
f_Y(y) = 2\frac{1}{\sqrt{2\pi}}e^{-y/2} \frac{1}{2\sqrt{y}} = \frac{1}{\sqrt{2\pi y}}e^{-y/2}. 

f _ y (y) = 2 frac {1}{ sqrt {2 pi } e ^ {-y/2} frac {1}{2 sqrt { y }} = frac {1}{2 pi y } e ^ {-y/2}.


If [math]\displaystyle{ y \geq 0 }[/math], then

This is a chi-squared distribution with one degree of freedom.

这是一个单自由度的 卡方分布chi-squared distribution'。


[math]\displaystyle{ \operatorname{P}(X^2 \le y) = \operatorname{P}(|X| \le \sqrt{y}) = \operatorname{P}(-\sqrt{y} \le X \le \sqrt{y}), }[/math]


Suppose X is a random variable with a normal distribution, whose density is

假设X是一个正态分布的随机变量,其密度为

so


f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/(2\sigma^2)}.

F _ x (x) = frac {1}{ sqrt {2 pi sigma ^ 2} e ^ {-(x-mu) ^ 2/(2 sigma ^ 2)}.

[math]\displaystyle{ F_Y(y) = F_X(\sqrt{y}) - F_X(-\sqrt{y})\qquad\hbox{if}\quad y \ge 0. }[/math]


Consider the random variable Y = X^2. We can find the density using the above formula for a change of variables:

考虑随机变量 Y = X ^ 2。我们可以用上面的变量变化公式来计算密度:

Example 2

例2


f_Y(y) = \sum_{i} f_X(g_{i}^{-1}(y)) \left| \frac{d g_{i}^{-1}(y)}{d y} \right|.

F _ y (y) = sum _ { i } f _ x (g _ { i } ^ {-1}(y)))左 | frac { g _ { i } ^ {-1}(y)}{ d y }右 | 。

Suppose [math]\displaystyle{ X }[/math] is a random variable with a cumulative distribution


In this case the change is not monotonic, because every value of Y has two corresponding values of X (one positive and negative). Differently from the previous example, in this case however, there is no symmetry and we have to compute the two distinct terms:

在这种情况下,变化不是单调的,因为 Y 的每个值都有两个对应的 X 值(一个正值和一个负值)。与前面的例子不同的是,在这个例子中,没有对称性,我们必须计算两个不同的项:

[math]\displaystyle{ F_{X}(x) = P(X \leq x) = \frac{1}{(1 + e^{-x})^{\theta}} }[/math]


f_Y(y) = f_X(g_1^{-1}(y))\left|\frac{d g_1^{-1}(y)}{d y} \right| +f_X(g_2^{-1}(y))\left| \frac{d g_2^{-1}(y)}{d y} \right|.

F _ y (y) = f _ x (g _ 1 ^ {-1}(y))左 | frac { d g _ 1 ^ {-1}(y)}{ d y }右 | + f _ x (g _ 2 ^ {-1}(y))左 | frac { d g _ 2 ^ {-1}(y)}{ d y }右 | 。

where [math]\displaystyle{ \theta \gt 0 }[/math] is a fixed parameter. Consider the random variable [math]\displaystyle{ Y = \mathrm{log}(1 + e^{-X}). }[/math] Then,


The inverse transformation is

逆变换是

[math]\displaystyle{ F_{Y}(y) = P(Y \leq y) = P(\mathrm{log}(1 + e^{-X}) \leq y) = P(X \geq -\mathrm{log}(e^{y} - 1)).\, }[/math]

x = g_{1,2}^{-1}(y) = \pm \sqrt{y}

X = g {1,2} ^ {-1}(y) = pm sqrt { y }


and its derivative is

它的衍变是

The last expression can be calculated in terms of the cumulative distribution of [math]\displaystyle{ X, }[/math] so

\frac{d g_{1,2}^{-1}(y)}{d y} = \pm \frac{1}{2\sqrt{y}} .

1,2} ^ {-1}(y)}{ d y } = pm frac {1}{2 sqrt { y }}.


[math]\displaystyle{ Then, 然后, \begin{align} F_Y(y) & = 1 - F_X(-\log(e^y - 1)) \\[5pt] f_Y(y) = \frac{1}{\sqrt{2\pi\sigma^2}} \frac{1}{2\sqrt{y}} (e^{-(\sqrt{y}-\mu)^2/(2\sigma^2)}+e^{-(-\sqrt{y}-\mu)^2/(2\sigma^2)}) . f _ y (y) = frac {1}{ sqrt {2 pi sigma ^ 2} frac {1}{2 sqrt { y }}(e ^ {-(sqrt { y }-mu) ^ 2/(2 sigma ^ 2)} + e ^ {-(- sqrt { y }-mu) ^ 2/(2 sigma ^ 2)}). & = 1 - \frac{1}{(1 + e^{\log(e^y - 1)})^\theta} \\[5pt] & = 1 - \frac{1}{(1 + e^y - 1)^\theta} \\[5pt] This is a noncentral chi-squared distribution with one degree of freedom. 这是一个单自由度的非中心卡方分布。 & = 1 - e^{-y \theta}. \end{align} }[/math]


which is the cumulative distribution function (CDF) of an exponential distribution.


Example 3

例3


There are several different senses in which random variables can be considered to be equivalent. Two random variables can be equal, equal almost surely, or equal in distribution.

有几种不同的意义,其中随机变量可以认为是等价的。两个随机变量在分布上可以相等、几乎必然相等或相等。

Suppose [math]\displaystyle{ X }[/math] is a random variable with a standard normal distribution, whose density is


In increasing order of strength, the precise definition of these notions of equivalence is given below.

为了增加强度的顺序,下面给出了这些等价概念的精确定义。

[math]\displaystyle{ f_X(x) = \frac{1}{\sqrt{2\pi}}e^{-x^2/2}. }[/math]


Consider the random variable [math]\displaystyle{ Y = X^2. }[/math] We can find the density using the above formula for a change of variables:


If the sample space is a subset of the real line, random variables X and Y are equal in distribution (denoted X \stackrel{d}{=} Y) if they have the same distribution functions:

如果样本空间是实线的子集,则随机变量 X 和 Y 的分布是相等的(表示为 X stackrel { d }{}{}} Y) ,如果它们具有相同的分布函数:

[math]\displaystyle{ f_Y(y) = \sum_{i} f_X(g_{i}^{-1}(y)) \left| \frac{d g_{i}^{-1}(y)}{d y} \right|. }[/math]

\operatorname{P}(X \le x) = \operatorname{P}(Y \le x)\quad\text{for all }x.

{ p }(x le x) = { p }(y le x) quad text { for all } x。


In this case the change is not monotonic, because every value of [math]\displaystyle{ Y }[/math] has two corresponding values of [math]\displaystyle{ X }[/math] (one positive and negative). However, because of symmetry, both halves will transform identically, i.e.,

To be equal in distribution, random variables need not be defined on the same probability space. Two random variables having equal moment generating functions have the same distribution. This provides, for example, a useful method of checking equality of certain functions of independent, identically distributed (IID) random variables. However, the moment generating function exists only for distributions that have a defined Laplace transform.

要想分布相等,随机变量不需要定义在同一个概率空间上。两个随机变量具有相等的矩生成函数,具有相同的分布。例如,这提供了一种有用的方法来检查独立的、相同分布(IID)随机变量的某些函数的平等性。然而,矩生成函数只存在于具有确定的拉普拉斯变换的分布中。


[math]\displaystyle{ f_Y(y) = 2f_X(g^{-1}(y)) \left| \frac{d g^{-1}(y)}{d y} \right|. }[/math]


The inverse transformation is

Two random variables X and Y are equal almost surely (denoted X \; \stackrel{\text{a.s.}}{=} \; Y) if, and only if, the probability that they are different is zero:

两个随机变量 X 和 Y 绝对相等(表示 x; stackrel { text { a.s. }}{ = } ; y)当且仅当它们不同的概率为零:

[math]\displaystyle{ x = g^{-1}(y) = \sqrt{y} }[/math]

and its derivative is

\operatorname{P}(X \neq Y) = 0.

{ p }(x neq y) = 0。

[math]\displaystyle{ \frac{d g^{-1}(y)}{d y} = \frac{1}{2\sqrt{y}} . }[/math]


For all practical purposes in probability theory, this notion of equivalence is as strong as actual equality. It is associated to the following distance:

在概率论的所有实际目的中,这个等价的概念和实际上的等价一样强大。它与下列距离有关:

Then,


d_\infty(X,Y)=\operatorname{ess} \sup_\omega|X(\omega)-Y(\omega)|,

D _ infty (x,y) = { ess } sup _ omega | x (omega)-y (omega) | ,

[math]\displaystyle{ f_Y(y) = 2\frac{1}{\sqrt{2\pi}}e^{-y/2} \frac{1}{2\sqrt{y}} = \frac{1}{\sqrt{2\pi y}}e^{-y/2}. }[/math]


where "ess sup" represents the essential supremum in the sense of measure theory.

其中“ ess sup”代表了测量理论意义上的本质上界。

This is a chi-squared distribution with one degree of freedom.


Example 4

例4


Finally, the two random variables X and Y are equal if they are equal as functions on their measurable space:

最后,如果两个随机变量 X 和 Y 在它们的可测空间上作为函数相等,那么它们是相等的:

Suppose [math]\displaystyle{ X }[/math] is a random variable with a normal distribution, whose density is


X(\omega)=Y(\omega)\qquad\hbox{for all }\omega.

X (omega) = y (omega) qquad hbox { for all } omega.

[math]\displaystyle{ f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/(2\sigma^2)}. }[/math]


This notion is typically the least useful in probability theory because in practice and in theory, the underlying measure space of the experiment is rarely explicitly characterized or even characterizable.

这个概念在概率论中通常是最没有用处的,因为在实践和理论中,实验的基本度量空间很少被明确表征,甚至是可表征的。

Consider the random variable [math]\displaystyle{ Y = X^2. }[/math] We can find the density using the above formula for a change of variables:


[math]\displaystyle{ f_Y(y) = \sum_{i} f_X(g_{i}^{-1}(y)) \left| \frac{d g_{i}^{-1}(y)}{d y} \right|. }[/math]


A significant theme in mathematical statistics consists of obtaining convergence results for certain sequences of random variables; for instance the law of large numbers and the central limit theorem.

数理统计的一个重要主题是获得某些随机变量序列的收敛结果,例如大数定律和中心极限定理。

In this case the change is not monotonic, because every value of [math]\displaystyle{ Y }[/math] has two corresponding values of [math]\displaystyle{ X }[/math] (one positive and negative). Differently from the previous example, in this case however, there is no symmetry and we have to compute the two distinct terms:


There are various senses in which a sequence X_n of random variables can converge to a random variable X. These are explained in the article on convergence of random variables.

一个随机变量序列X_n可以收敛到一个随机变量X,有多种意义,这些在随机变量收敛性的文章中会有解释。

[math]\displaystyle{ f_Y(y) = f_X(g_1^{-1}(y))\left|\frac{d g_1^{-1}(y)}{d y} \right| +f_X(g_2^{-1}(y))\left| \frac{d g_2^{-1}(y)}{d y} \right|. }[/math]


The inverse transformation is

[math]\displaystyle{ x = g_{1,2}^{-1}(y) = \pm \sqrt{y} }[/math]

and its derivative is

[math]\displaystyle{ \frac{d g_{1,2}^{-1}(y)}{d y} = \pm \frac{1}{2\sqrt{y}} . }[/math]


Then,


[math]\displaystyle{ f_Y(y) = \frac{1}{\sqrt{2\pi\sigma^2}} \frac{1}{2\sqrt{y}} (e^{-(\sqrt{y}-\mu)^2/(2\sigma^2)}+e^{-(-\sqrt{y}-\mu)^2/(2\sigma^2)}) . }[/math]


This is a noncentral chi-squared distribution with one degree of freedom.


Some properties

一些性质

  • The probability distribution of the sum of two independent random variables is the convolution of each of their distributions.


Equivalence of random variables

随机变量的等价性


There are several different senses in which random variables can be considered to be equivalent. Two random variables can be equal, equal almost surely, or equal in distribution.


In increasing order of strength, the precise definition of these notions of equivalence is given below.


Equality in distribution

分配平等


If the sample space is a subset of the real line, random variables X and Y are equal in distribution (denoted [math]\displaystyle{ X \stackrel{d}{=} Y }[/math]) if they have the same distribution functions:

[math]\displaystyle{ \operatorname{P}(X \le x) = \operatorname{P}(Y \le x)\quad\text{for all }x. }[/math]


To be equal in distribution, random variables need not be defined on the same probability space. Two random variables having equal moment generating functions have the same distribution. This provides, for example, a useful method of checking equality of certain functions of independent, identically distributed (IID) random variables. However, the moment generating function exists only for distributions that have a defined Laplace transform.


Almost sure equality

几乎确定平等


Two random variables X and Y are equal almost surely (denoted [math]\displaystyle{ X \; \stackrel{\text{a.s.}}{=} \; Y }[/math]) if, and only if, the probability that they are different is zero:


[math]\displaystyle{ \operatorname{P}(X \neq Y) = 0. }[/math]


For all practical purposes in probability theory, this notion of equivalence is as strong as actual equality. It is associated to the following distance:


[math]\displaystyle{ d_\infty(X,Y)=\operatorname{ess} \sup_\omega|X(\omega)-Y(\omega)|, }[/math]


where "ess sup" represents the essential supremum in the sense of measure theory.

Category:Statistical randomness

分类: 统计的随机性


This page was moved from wikipedia:en:Random variable. Its edit history can be viewed at 随机变量/edithistory

  1. Blitzstein, Joe; Hwang, Jessica (2014). Introduction to Probability. CRC Press. ISBN 9781466575592. 
  2. 2.0 2.1 Steigerwald, Douglas G. "Economics 245A – Introduction to Measure Theory" (PDF). University of California, Santa Barbara. Retrieved April 26, 2013.
  3. 3.0 3.1 "List of Probability and Statistics Symbols". Math Vault (in English). 2020-04-26. Retrieved 2020-08-21.
  4. "Random Variables". www.mathsisfun.com. Retrieved 2020-08-21.
  5. 5.0 5.1 Yates, Daniel S.; Moore, David S; Starnes, Daren S. (2003). The Practice of Statistics (2nd ed.). New York: Freeman. ISBN 978-0-7167-4773-4. http://bcs.whfreeman.com/yates2e/. 
  6. "Random Variables". www.stat.yale.edu. Retrieved 2020-08-21.
  7. L. Castañeda; V. Arunachalam; S. Dharmaraja (2012). Introduction to Probability and Stochastic Processes with Applications. Wiley. p. 67. ISBN 9781118344941. https://books.google.com/books?id=zxXRn-Qmtk8C&pg=PA67. 
  8. 8.0 8.1 8.2 8.3 Bertsekas, Dimitri P. (2002). Introduction to Probability. Tsitsiklis, John N., Τσιτσικλής, Γιάννης Ν.. Belmont, Mass.: Athena Scientific. ISBN 188652940X. OCLC 51441829. 
  9. 脚本错误:没有“Footnotes”这个模块。