“二项分布”的版本间的差异

2021年8月8日 (日) 16:33的版本

此词条暂由南风翻译。已由Smile审校模板:NoteTA 模板:Infobox 機率分佈

文件:Pascal's triangle; binomial distribution.svg

Binomial distribution for [math]\displaystyle{ p=0.5 }[/math]
with n and k as in Pascal's triangle

The probability that a ball in a Galton box with 8 layers (n = 8) ends up in the central bin (k = 4) is [math]\displaystyle{ 70/256 }[/math].

Binomial distribution for p=0.5
with n and k as in [[Pascal's triangle

The probability that a ball in a Galton box with 8 layers (n = 8) ends up in the central bin (k = 4) is 70/256.]]

文章File:Pascal's triangle; binomial distribution.svg是[math]\displaystyle{ p=0.5 }[/math]
与n和k相关的二项分布。一个8层(n = 8)的高尔顿盒子中的一个球最终进入中央箱子(k = 4)的概率是[math]\displaystyle{ 70/256 }[/math]。

在概率论和统计学中，参数为n和p的二项分布是n个独立实验序列中成功次数的离散概率分布 discrete probability distribution ，每个实验结果是一个是/否问题，每个实验都有布尔值结果: 成功/是/正确/1 (概率为 p)或失败/否/错误/0 (概率为 q = 1 − p)。

一个单一的结果为成功或失败的实验也被称为伯努利试验 Bernoulli trial或伯努利实验 Bernoulli experiment ，一系列伯努利实验结果被称为伯努利过程 Bernoulli process ; 对于一个单一的实验，即n = 1，这个二项分布是一个伯努利分布 Bernoulli distribution。二项分布是统计显著性 statistical significance 的二项检验 binomial test 的基础。

二项分布经常被用来模拟大小为n的样本中的成功数量，这些样本是从大小为N的种群中有放回地抽取的。如果抽样没有把抽取的个体放回总体中，抽样就不是独立的，所以得到的分布是一个超几何分布 hypergeometric distribution ，而不是二项分布。然而，对于N远大于n的情况，二项分布仍然是一个很好的近似，并且被广泛使用。

定义

概率质量函数

一般来说，如果随机变量 random variable X服从参数n ∈ ℕ且 p ∈ [0,1]的二项分布，记作X ~ B(n, p)。在n个独立的伯努利试验中获得k次成功的概率由概率质量函数给出:

[math]\displaystyle{ f(k,n,p) = \Pr(k;n,p) = \Pr(X = k) = \binom{n}{k}p^k(1-p)^{n-k} }[/math]

对于k = 0, 1, 2, ..., n，其中

[math]\displaystyle{ \binom{n}{k} =\frac{n!}{k!(n-k)!} }[/math]

是二项式系数 binomial coefficient，因此有了分布的名字。这个公式可以理解为，K次成功发生在概率为p^k的情况下，n − k次失败发生在概率为(1 − p)^n − k的情况下。然而，k次成功可以发生在n个试验中的任何一个，并且在n个试验序列中有[math]\displaystyle{ \binom{n}{k} }[/math]种k次试验成功的不同分配方法。

在创建二项分布概率的参考表时，通常表中最多填充到n/2的值。这是因为对于k > n/2，概率可以通过它的补来计算。

[math]\displaystyle{ f(k,n,p)=f(n-k,n,1-p). }[/math].

把表达式f(k, n, p)看作k的函数，存在一个k值使它达到最大。这个k 值可以通过计算得到。

[math]\displaystyle{ \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} }[/math]

并且与1相比较。总有一个整数M满足^[1]

[math]\displaystyle{ (n+1)p-1 \leq M \lt (n+1)p. }[/math].

f(k, n, p)对k < M 是单调递增的，对k > M 是单调递减的，但(n + 1)p是整数的情况除外。在这种情况下，有(n + 1)p 和 (n + 1)p −1 两个值使f达到最大。M 是伯努利试验最有可能的结果(也就是说，发生的可能性最大，尽管仍然存在不发生的情况) ，被称为模。

例子

假设抛出一枚有偏硬币 biased coin 时，正面朝上的概率为0.3。在6次抛掷中恰好看到4个正面的概率是

[math]\displaystyle{ f(4,6,0.3) = \binom{6}{4}0.3^4 (1-0.3)^{6-4}= 0.059535. }[/math].

累积分布函数

累积分布函数可以表达为:

[math]\displaystyle{ F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}, }[/math] ,

[math]\displaystyle{ \lfloor k\rfloor }[/math]是k的向下取整 round down，即小于或等于k的最大整数。

在正则化不完全的beta函数 regularized incomplete beta function 下，它也可以表示如下: ^[2]

[math]\displaystyle{ \begin{align} F(k;n,p) & = \Pr(X \le k) \\ &= I_{1-p}(n-k, k+1) \\ & = (n-k) {n \choose k} \int_0^{1-p} t^{n-k-1} (1-t)^k \, dt. \end{align} }[/math]

这相当于F分布 F-distribution的累积分布函数: ^[3]

[math]\displaystyle{ F(k;n,p) = F_{F\text{-distribution}}\left(x=\frac{1-p}{p}\frac{k+1}{n-k};d_1=2(n-k),d_2=2(k+1)\right). }[/math]

下面给出了累积分布函数的一些闭式界 closed-form bounds 。

属性

期望值和方差

如果X ~ B(n, p)，即X是一个服从二项分布的随机变量，n 是实验的总数，p 是每个实验得到成功结果的概率，那么X的期望值是:

[math]\displaystyle{ \operatorname{E}[X] = np. }[/math]。

这是由于期望值的线性性 linearity，以及 $X$ 是 $n$ 个相同的伯努利随机变量的线性组合，每个变量都有期望值 $p$ 。换句话说，如果[math]\displaystyle{ X_1, \ldots, X_n }[/math]是参数 $p$ 的相同的（且独立的）伯努利随机变量，那么

[math]\displaystyle{ X = X_1 + \cdots + X_n }[/math]

[math]\displaystyle{ \operatorname{E}[X] = \operatorname{E}[X_1 + \cdots + X_n] = \operatorname{E}[X_1] + \cdots + \operatorname{E}[X_n] = p + \cdots + p = np. }[/math]

方差是:

[math]\displaystyle{ \operatorname{Var}(X) = np(1 - p). }[/math]

这也是因为独立随机变量和的方差是方差之和。

高阶矩

前6个中心矩由

[math]\displaystyle{ \begin{align} \mu_1 &= 0, \\ \mu_2 &= np(1-p),\\ \mu_3 &= np(1-p)(1-2p),\\ \mu_4 &= np(1-p)(1+(3n-6)p(1-p)),\\ \mu_5 &= np(1-p)(1-2p)(1+(10n-12)p(1-p)),\\ \mu_6 &= np(1-p)(1-30p(1-p)(1-4p(1-p))+5np(1-p)(5-26p(1-p))+15n^2 p^2 (1-p)^2). \end{align} }[/math]

模

通常二项式B(n, p)分布的模等于[math]\displaystyle{ \lfloor (n+1)p\rfloor }[/math]，其中[math]\displaystyle{ \lfloor\cdot\rfloor }[/math]是向下取整函数 floor function 。然而，当(n + 1)p是整数且p不为0或1时，二项分布有两种模: (n + 1)p和(n + 1)p − 1。当p等于0或1时，对应的模为0或n。这些情况可总结如下:

[math]\displaystyle{ \text{mode} = \begin{cases} \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ (n+1)\,p\ \text{ and }\ (n+1)\,p - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1. \end{cases} }[/math]

证明: 让

[math]\displaystyle{ f(k)=\binom nk p^k q^{n-k}. }[/math]

当[math]\displaystyle{ p=0 }[/math]，只有[math]\displaystyle{ f(0) }[/math]有一个非零值，[math]\displaystyle{ f(0)=1 }[/math]。当[math]\displaystyle{ p=1 }[/math]，我们发现当[math]\displaystyle{ k\neq n }[/math]，[math]\displaystyle{ f(n)=1 }[/math]且[math]\displaystyle{ f(k)=0 }[/math]。这证明了[math]\displaystyle{ p=0 }[/math]时模为0，[math]\displaystyle{ p=1 }[/math]时模为[math]\displaystyle{ n }[/math]。

当[math]\displaystyle{ 0 \lt p \lt 1 }[/math]。我们发现

[math]\displaystyle{ \frac{f(k+1)}{f(k)} = \frac{(n-k)p}{(k+1)(1-p)} }[/math].

由此可见

[math]\displaystyle{ \begin{align} k \gt (n+1)p-1 \Rightarrow f(k+1) \lt f(k) \\ k = (n+1)p-1 \Rightarrow f(k+1) = f(k) \\ k \lt (n+1)p-1 \Rightarrow f(k+1) \gt f(k) \end{align} }[/math]

所以当[math]\displaystyle{ (n+1)p-1 }[/math]是一个整数时，[math]\displaystyle{ (n+1)p-1 }[/math]和[math]\displaystyle{ (n+1)p }[/math]是一个模。在[math]\displaystyle{ (n+1)p-1\notin Z }[/math]的情况下，只有[math]\displaystyle{ \lfloor (n+1)p-1\rfloor+1=\lfloor (n+1)p\rfloor }[/math]是模。^[4]

中位数

一般来说，没有单一的公式可以找到一个二项分布的中位数，甚至可能不是唯一的。然而，几个特殊的结果是已经确定的:

如果np是一个整数，那么它的均值，中位数和模相同且等于np。^[5]^[6]

任何中位数m都必须满足⌊np⌋ ≤ m ≤ ⌈np⌉。^[7]

中位数m不能离均值太远。|m − np| ≤ min{ ln 2, max{p, 1 − p} }^[8]

F(k;n,p) \geq \frac{1}{\sqrt{8n\tfrac{k}{n}(1-\tfrac{k}{n})}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right),

中位数是唯一的并且等于m = round(np)，此时|m − np| ≤ min{p, 1 − p}（[math]\displaystyle{ ''p'' = {{sfrac|1|2}} }[/math]和 n 是奇数的情况除外）

这意味着更简单但更宽松的界限

F(k;n,p) \geq \frac1{\sqrt{2n}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right).

For p = 1/2 and k ≥ 3n/8 for even n, it is possible to make the denominator constant:

对于p = 1/2且n是奇数，任意m满足 (n − 1) ≤ m ≤ (n + 1)是一个二项分布的中位数。如果p = 1/2且n 是偶数，那么m = n/2是唯一的中位数:

[math]\displaystyle{ F(k;n,p) \geq \frac1{\sqrt{2n}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right); }[/math]

当p = 1/2并且n为偶数，k ≥ 3n/8时, 可以使分母为常数。

尾部边界

对于k≤np，可以得出累积分布函数左尾的上界[math]\displaystyle{ F(k;n,p)=Pr(X \le k) }[/math]，即最多存在k次成功的概率。由于[math]\displaystyle{ Pr(X \ge k) = F(n-k;n,1-p) }[/math]，这些界限也可以看作是k≥np的累积分布函数右尾的边界。

F(k;n,\tfrac{1}{2}) \geq \frac{1}{15} \exp\left(- 16n \left(\frac{1}{2} -\frac{k}{n}\right)^2\right). \!

[math]\displaystyle{ F(k;n,\tfrac{1}{2}) \geq \frac{1}{15} \exp\left(- 16n \left(\frac{1}{2} -\frac{k}{n}\right)^2\right). \! }[/math]

Hoeffding's inequality yields the simple bound

霍夫丁不等式 Hoeffding's inequality 得到简单的边界

[math]\displaystyle{ F(k;n,p) \leq \exp\left(-2 n\left(p-\frac{k}{n}\right)^2\right), \! }[/math]

然而，这并不是很严格。特别是，当p=1时，有F(k;n，p) = 0(对于固定的k，n与k < n)，但是Hoeffding的约束评价为一个正的常数。

当 n 已知时，参数 p 可以使用成功的比例来估计:[math]\displaystyle{ \widehat{p} = \frac{x}{n} }[/math]。可以利用极大似然估计 maximum likelihood estimator 和矩方法 method of moments来求出该估计量。Lehmann-scheffé 定理证明了该估计量是无偏的一致的且方差最小的，因为该估计量是基于一个极小充分完备统计量 sufficient and complete statistic(即: x).它在概率和均方误差 MSE方面也是一致的。

可以从切尔诺夫界 Chernoff bound中得到一个更清晰的边界。^[9]

利用 Beta分布作为共轭先验分布 conjugate prior distribution 时，也存在p的封闭形式的贝叶斯估计 Bayes estimator 。当使用一个通用[math]\displaystyle{ \operatorname{Beta}(\alpha, \beta) }[/math]作为先验时，后验均值估计量为: [math]\displaystyle{ \widehat{p_b} = \frac{x+\alpha}{n+\alpha+\beta} }[/math]。贝叶斯估计是渐近有效的，当样本容量趋近无穷大(n →∞)时，它趋近极大似然估计解。贝叶斯估计是有偏的(偏多少取决于先验) ，可接受的且一致的概率。

[math]\displaystyle{ F(k;n,p) \leq \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right) }[/math]

对于使用标准均匀分布作为非信息性的先验概率的特殊情况([math]\displaystyle{ \operatorname{Beta}(\alpha=1, \beta=1) = U(0,1) }[/math])，后验均值估计变为[math]\displaystyle{ \widehat{p_b} = \frac{x+1}{n+2} }[/math] (后验模式应只能得出标准估计量)。这种方法被称为继承法则 the rule of succession ，它是18世纪皮埃尔-西蒙·拉普拉斯 Pierre-Simon Laplace提出的。

其中D(a || p)是参数为a和p的相对熵，即Bernoulli(a)和Bernoulli(p)概率分布的差值：

当估计用非常罕见的事件和一个小的n (例如，如果x = 0) ，那么使用标准估计会得到[math]\displaystyle{ \widehat{p} = 0 }[/math]，这有时是不现实的和我们不希望看到的。在这种情况下，有各种可供选择的估计值。一种方法是使用贝叶斯估计，得到:[math]\displaystyle{ \widehat{p_b} = \frac{1}{n+2} }[/math])。另一种方法是利用从3个规则获得的置信区间的上界: [math]\displaystyle{ \widehat{p_{\text{rule of 3}}} = \frac{3}{n}) }[/math]

[math]\displaystyle{ D(a\parallel p)=(a)\log\frac{a}{p}+(1-a)\log\frac{1-a}{1-p}. \! }[/math]

渐近地，这个边界是相当严格的；详见^[9]。

即使对于非常大的 n 值，均值的实际分布是非正态的。针对这一问题，提出了几种估计置信区间的方法。

我们还可以得到尾部[math]\displaystyle{ F(k;n,p) }[/math]的下界，即反集中界anti-concentration bounds 。通过用斯特林公式 Stirling's formula对二项式系数进行近似，可以看出：^[10]

[math]\displaystyle{ F(k;n,p) \geq \frac{1}{\sqrt{8n\tfrac{k}{n}(1-\tfrac{k}{n})}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right), }[/math]

在下面的置信区间等式中，这些变量具有以下含义:

这意味着更简单但更松散的约束。

[math]\displaystyle{ F(k;n,p) \geq \frac1{\sqrt{2n}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right). }[/math]

当p = 1/2并且n为偶数，k ≥ 3n/8时, 可以使分母为常数

[math]\displaystyle{ F(k;n,\tfrac{1}{2}) \geq \frac{1}{15} \exp\left(- 16n \left(\frac{1}{2} -\frac{k}{n}\right)^2\right). \! }[/math]

 \widehat{p\,} \pm z \sqrt{ \frac{ \widehat{p\,} ( 1 -\widehat{p\,} )}{ n } } .

[math]\displaystyle{ \widehat{p\,} \pm z \sqrt{ \frac{ \widehat{p\,} ( 1 -\widehat{p\,} )}{ n } } }[/math]

统计推断

可以加上0.5/n 的连续校正。

参数估计

Beta分布贝叶斯推断

当n已知时，参数p可以用成功的比例来估计：[math]\displaystyle{ \widehat{p} = \frac{x}{n}. }[/math]。这个估计是用极大似然估计法和矩估计方法来计算的。这个估计是无偏的、一致的且有最小的方差，由Lehmann-Scheffé定理证明，因为它是基于最小充分完备统计量（即：x）。它的概率和均方误差（MSE）也是一致估计。

 \tilde{p} \pm z \sqrt{ \frac{ \tilde{p} ( 1 - \tilde{p} )}{ n + z^2 } } .

利用 Beta分布作为共轭先验分布时，也存在p的封闭形式的贝叶斯估计。当使用一个通用[math]\displaystyle{ \operatorname{Beta}(\alpha, \beta) }[/math]作为先验时，后验均值估计量为: [math]\displaystyle{ \widehat{p_b} = \frac{x+\alpha}{n+\alpha+\beta} }[/math]。贝叶斯估计是渐近有效的，当样本容量趋近无穷大(n →∞)时，它趋近极大似然估计（MLE）解。贝叶斯估计是有偏的(偏多少取决于先验) ，可接受的且一致的概率。

Here the estimate of p is modified to

这里p的估计被修改为

对于使用标准均匀分布作为非信息性的先验概率的特殊情况([math]\displaystyle{ \operatorname{Beta}(\alpha=1, \beta=1) = U(0,1) }[/math])，后验均值估计变为[math]\displaystyle{ \widehat{p_b} = \frac{x+1}{n+2} }[/math] (后验模式应只能得出标准估计量)。这种方法被称为继承法则，它是18世纪 Pierre-Simon Laplace提出的。

 \tilde{p}= \frac{ n_1 + \frac{1}{2} z^2}{ n + z^2 }

[math]\displaystyle{ \tilde{p}= \frac{ n_1 + \frac{1}{2} z^2}{ n + z^2 } }[/math]

当估计值p时非常罕见，而且很小（例如：如果x=0），那么使用标准估计器会得到[math]\displaystyle{ \widehat{p} = 0, }[/math]，这有时是不现实的，也是不可取的。在这种情况下，有几种不同的可替代的估计方法。^[11]一种方法是使用贝叶斯估计，得到: [math]\displaystyle{ \widehat{p_b} = \frac{1}{n+2} }[/math])。另一种方法是利用从3个规则获得的置信区间的上界: [math]\displaystyle{ \widehat{p_{\text{rule of 3}}} = \frac{3}{n} }[/math])

值信区间

\sin^2 \left(\arcsin \left(\sqrt{\widehat{p\,}}\right) \pm \frac{z}{2\sqrt{n}} \right).

[math]\displaystyle{ \sin^2 \left(\arcsin \left(\sqrt{\widehat{p\,}}\right) \pm \frac{z}{2\sqrt{n}} \right). }[/math]

即使对于相当大的n值，平均数的实际分布是显著非正态的，^[12]由于这个问题，人们提出了几种估计置信区间的方法。

在下面的置信区间公式中，变量具有以下含义

n₁ is the number of successes out of n, the total number of trials

n₁是n中的成功次数，即试验的总次数。

[math]\displaystyle{ \widehat{p\,} = \frac{n_1}{n} }[/math] is the proportion of successes

[math]\displaystyle{ \widehat{p\,} = \frac{n_1}{n} }[/math]是成功的比例。

下列公式中的符号在两个地方不同于以前的公式:

[math]\displaystyle{ z }[/math] is the [math]\displaystyle{ 1 - \tfrac{1}{2}\alpha }[/math] quantile of a standard normal distribution (i.e., probit) corresponding to the target error rate [math]\displaystyle{ \alpha }[/math]. For example, for a 95% confidence level the error [math]\displaystyle{ \alpha }[/math] = 0.05, so [math]\displaystyle{ 1 - \tfrac{1}{2}\alpha }[/math] = 0.975 and [math]\displaystyle{ z }[/math] = 1.96.

[math]\displaystyle{ z }[/math]是标准正态分布 standard normal distribution 的[math]\displaystyle{ 1 - \tfrac{1}{2}\alpha }[/math]分位数(即概率)对应的目标错误率 [math]\displaystyle{ \alpha }[/math]。例如，95%的置信度 confidence level 的错误率为[math]\displaystyle{ \alpha }[/math] = 0.05，因此 [math]\displaystyle{ 1 - \tfrac{1}{2}\alpha }[/math] = 0.975 并且[math]\displaystyle{ z }[/math] = 1.96.

Wald 法

[math]\displaystyle{ \frac{p}{z^2}{2n}\widehat{p\,} + \frac{z^2}{2n} + z }[/math]

可以添加一个0.5/n连续调整。（2012年7月更新）

[math]\displaystyle{ \sqrt{\frac{p}{n}\widehat{p\,}(1 - \widehat{p\,}){n} }[/math]

阿格里斯蒂-库尔方法

[math]\displaystyle{ \frac{z^2}{4 n^2} }[/math]

^[13] {

[math]\displaystyle{ \tilde{p} \pm z \sqrt{ \frac{ \tilde{p} ( 1 - \tilde{p} )}{ n + z^2 } } . }[/math]

[math]\displaystyle{ 1 + \frac{z^2}{n} }[/math]

这里p的估计量被修改为

[math]\displaystyle{ \tilde{p}= \frac{ n_1 + \frac{1}{2} z^2}{ n + z^2 } }[/math]

确切的(克洛佩尔-皮尔森)方法是最保守的。

弧线法

设X ~ B(n,p1)和Y ~ B(m,p2)是独立的。设T = (X/n)/(Y/m)。

^[14]

然后log(T)近似服从正态分布，均值为log(p1/p2)和方差为[math]\displaystyle{ ((1/p1) − 1)/n + ((1/p2) − 1)/m }[/math]。

[math]\displaystyle{ \sin^2 \left(\arcsin \left(\sqrt{\widehat{p\,}}\right) \pm \frac{z}{2\sqrt{n}} \right). }[/math]

威尔逊法

If X ~ B(n, p) and Y | X ~ B(X, q) (the conditional distribution of Y, given X), then Y is a simple binomial random variable with distribution Y ~ B(n, pq).

如果X ~ B(n, p)和Y | X ~ B(X, q) (给定Y的条件分布 X) ，则Y是服从Y ~ B(n, pq)的简单二项随机变量。

例如，想象一下把 n 个球扔到一个篮子UX里，然后把击中的球扔到另一个篮子UY里。如果 p 是击中 UX 的概率，那么X ~ B(n, p)是击中 UX 的球数。如果 q 是击中 UY 的概率，那么击中 UY的球数是Y ~ B(X, q)，那么Y ~ B(n, pq)。

The notation in the formula below differs from the previous formulas in two respects:^[15]

下面的公式中的符号与前面的公式有两个不同之处^[15]

首先，z_x在下式中的解释略有不同：它的普通含义是标准正态分布x-th的分位数，而不是(1 − x)-th分位数的简写。

其次，这个公式没有使用加减法来定义两个界限。相反，我们可以使用[math]\displaystyle{ z = z_{/alpha / 2} }[/math]得到下限，或者使用[math]\displaystyle{ z = z_{1 - \alpha/2} }[/math]得到上限。例如：对于95%的置信度，误差为[math]\displaystyle{ alpha }[/math] = 0.05，所以用[math]\displaystyle{ z = z_{/alpha/2} = z_{0.025} = - 1.96 }[/math]得到下限，用[math]\displaystyle{ z = z_{1 - \alpha/2} = z_{0.975} = 1.96 }[/math]得到上限。

由于X [math]\displaystyle{ \sim B(n, p) }[/math]和Y [math]\displaystyle{ \sim B(X, q) }[/math]，由全概率公式 the law of total probability ,

[math]\displaystyle{ \begin{align} }[/math]

[math]\displaystyle{ \frac{} \lt math\gt \Pr[Y = m] &= \sum_{k = m}^{n} \Pr[Y = m \mid X = k] \Pr[X = k] \\[2pt] }[/math]

[math]\displaystyle{ \widehat{p\,} + \frac{z^2}{2n} + z }[/math]

[math]\displaystyle{ &= \sum_{k=m}^n \binom{n}{k} \binom{k}{m} p^k q^m (1-p)^{n-k} (1-q)^{k-m} }[/math]

[math]\displaystyle{ \frac{\widehat{p\,}(1 - \widehat{p\,})}{n} }[/math]

由于[math]\displaystyle{ \tbinom{n}{k} \tbinom{k}{m} = \tbinom{n}{m} \tbinom{n-m}{k-m} }[/math]，上述方程可表示为

[math]\displaystyle{ \frac{z^2}{4 n^2} }[/math]

[math]\displaystyle{ \Pr[Y = m] = \sum_{k=m}^{n} \binom{n}{m} \binom{n-m}{k-m} p^k q^m (1-p)^{n-k} (1-q)^{k-m} }[/math]

对[math]\displaystyle{ p ^ k = p ^ m p ^ { k-m } }[/math]进行分解，从总和中取出所有不依赖于 k 的项，现在就得到了结果

}{

[math]\displaystyle{ 1 + \frac{z^2}{n} }[/math]

[math]\displaystyle{ \Pr[Y = m] &= \binom{n}{m} p^m q^m \left( \sum_{k=m}^n \binom{n-m}{k-m} p^{k-m} (1-p)^{n-k} (1-q)^{k-m} \right) \\[2pt]} }[/math]^[16]

[math]\displaystyle{ &= \binom{n}{m} (pq)^m (1-pq)^{n-m} }[/math]

比较

因此[math]\displaystyle{ Y \sim B(n, pq) }[/math]为所需值。

最精确的二项式比例置信区间#Clopper–Pearson区间方法是最保守的。^[12]

Wald法虽然是教科书上普遍推荐的方法，但却是最偏颇的方法。

伯努利分布是二项分布的一个特例，其中n = 1。在符号上，X ~ B(1, p)与X ~ Bernoulli(p)具有相同的意义。反之，任何二项分布B(n, p)是 n 个伯努利试验和的分布，每个试验的概率 p 相同。

相关分布

二项式之和

二项分布是泊松二项分布的一个特例，也叫一般二项分布，它是 n 个独立的不同的伯努利试验B(pi)和的分布。

If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables with the same probability p, then X + Y is again a binomial variable; its distribution is Z=X+Y ~ B(n+m, p):

如果X ~ B(n, p)和Y ~ B(m, p)是独立的二项式变量，概率相同且为p，那么X + Y又是一个二项式变量，其分布是Z=X+Y ~ B(n+m, p)。

二项式n = 6 and p = 0.5的概率质量函数和正态概率密度函数近似

[math]\displaystyle{ &= \binom{n+m}k p^k (1-p)^{n+m-k} }[/math]

如果 n 足够大，那么分布的偏斜就不会太大。在这种情况下，通过正态分布给出B(n, p)的合理近似

但是，如果X和Y的概率p不一样，那么和的方差将是小于二项式变量的方差的分布为[math]\displaystyle{ B(n+m, \bar{p}).\, }[/math]。

[math]\displaystyle{ \mathcal{N}(np,\,np(1-p)) }[/math]

两个二项式分布的比值

通过适当的连续性修正，可以简单地改进这种基本近似。

基本近似通常随着 n 的增加而改进(至少20) ，当 p 不接近0或1时更好。经验法则可以用来判断 n 是否足够大，p的极值是否远离0或1:

This result was first derived by Katz and coauthors in 1978.^[17]

这个结果最早是由卡兹 Katz和合著者在1978年得出的。^[17]

令X ~ B(n,p₁)和Y ~ B(m,p₂)独立，T = (X/n)/(Y/m)。

例如，假设从大群体中随机抽取了 n 个人，然后询问他们是否同意某种说法。同意的人的比例取决于样本。如果 n 组人群被重复随机地取样，其比例将遵循一个近似正态分布，均值等于总体中一致性的真实比例 p，标准差[math]\displaystyle{ \sigma = \sqrt{\frac{p(1-p)}{n}} }[/math]

则log(T)近似正态分布，均值为log(p₁/p₂)，方差为((1/p₁) - 1)/n + ((1/p₂) - 1)/m。

条件二项式

If X ~ B(n, p) and Y | X ~ B(X, q) (the conditional distribution of Y, given X), then Y is a simple binomial random variable with distribution Y ~ B(n, pq).

如果X ~ B(n, p)和Y | 'X ~ B(X, q)('Y的条件分布，给定&nbsp。 X），则Y是一个简单的二项式随机变量，其分布为Y ~ B(n, pq)。

当试验数量趋于无穷大，而np 保持不变或者至少 p 趋于零时，二项分布收敛到泊松分布。因此，如果 n 是足够大，p 足够小的话，参数为λ = np的泊松分布可以作为二项分布B(n, p)的近似。根据两个经验法则，如果n ≥ 20和p ≤ 0.05,或者如果n ≥ 100 and np ≤ 10，则这个近似是好的。

例如，想象将n个球扔到一个篮子里U_X，然后把击中的球扔到另一个篮子里U_Y。如果p是击中U_X的概率，那么X ~ B(n, p)就是击中U_X的球数。如果q是击中U_Y的概率，那么击中U_Y的球数是Y ~ B(X, q)，因此Y ~ B(n, pq)。

关于泊松近似的准确性，参见 Novak，ch.4，及其中的参考资料。

由于[math]\displaystyle{ X \sim B(n, p) }[/math]和[math]\displaystyle{ Y \sim B(X, q) }[/math]，由全概率公式，

[math]\displaystyle{ \Pr[Y = m] &= \sum_{k = m}^{n} \Pr[Y = m \mid X = k] \Pr[X = k] \\[2pt] }[/math]

[math]\displaystyle{ P(p;\alpha,\beta) =\frac{p^{\alpha-1}(1-p)^{\beta-1}}{\mathrm{B}(\alpha,\beta)}. }[/math]

[math]\displaystyle{ P (p; alpha，beta) = frac { p ^ { alpha-1}(1-p) ^ { beta-1}{ mathrm { b }(alpha，beta)}}. }[/math]

[math]\displaystyle{ &= \sum_{k=m}^n \binom{n}{k} \binom{k}{m} p^k q^m (1-p)^{n-k} (1-q)^{k-m} }[/math]</math>

给定一个一致性先验，给定观察到成功结果的独立事件成功概率的后验分布是一个beta分布。

由于[math]\displaystyle{ \tbinom{n}{k} \tbinom{k}{m} = \tbinom{n}{m} \tbinom{n-m}{k-m}, }[/math]上式可表示为

[math]\displaystyle{ \Pr[Y = m] = \sum_{k=m}^{n} \binom{n}{m} \binom{n-m}{k-m} p^k q^m (1-p)^{n-k} (1-q)^{k-m} }[/math]

将 [math]\displaystyle{ p^k = p^m p^{k-m} }[/math] 进行分解，并将所有不依赖于 [math]\displaystyle{ k }[/math] 的项从总和中抽出，即可得到

边缘分布 marginal distribution 是二项分布较完善的随机数产生方法。

[math]\displaystyle{ \Pr[Y = m] &= \binom{n}{m} p^m q^m \left( \sum_{k=m}^n \binom{n-m}{k-m} p^{k-m} (1-p)^{n-k} (1-q)^{k-m} \right) \\[2pt] }[/math]

一种从二项分布中产生随机样本的方法是使用反演算法 inversion algorithm 。要做到这一点，我们必须计算从到的所有值的概率。(为了包含整个样本空间，这些概率的和应该接近于1。)然后，通过使用伪随机数生成器来生成介于0和1之间的样本，可以使用在第一步计算出的概率将计算出的样本转换成离散数。

[math]\displaystyle{ &= \binom{n}{m} (pq)^m \left( \sum_{k=m}^n \binom{n-m}{k-m} \left(p(1-q)\right)^{k-m} (1-p)^{n-k} \right) }[/math]

将 [math]\displaystyle{ i = k - m }[/math] 代入上述表达式后，我们得到了

[math]\displaystyle{ \Pr[Y = m] = \binom{n}{m} (pq)^m \left( \sum_{i=0}^{n-m} \binom{n-m}{i} (p - pq)^i (1-p)^{n-m - i} \right) }[/math]

这个分布是由雅各布伯努利 Jacob Bernoulli推导出来的。他考虑了p = r/(r + s)的情形，其中 p 是成功的概率，r 和 s 是正整数。早些时候，布莱斯 · 帕斯卡 Blaise Pascal考虑过p = 1/2的情况。

请注意，上述的和（括号内）等于[math]\displaystyle{ (p - pq + 1 - p)^{n-m} }[/math]由二项式定理 binomial theorem得出。将此代入最终得到

伯努利分布

伯努利分布是二项分布的特例，其中n = 1.从符号上看，X ~ B(1, p)与X ~ Bernoulli(p)具有相同的意义。相反，任何二项分布，B(n, p)是n个伯努利试验的和的分布，每个概率p相同。^[18]

泊松二项分布

二项分布是泊松二项分布或广义二项分布的特例，它是n个独立的不相同的伯努利试验之和的分布。B(p_i) ^[19]

类别: 离散分布

正态逼近

类别: 阶乘和二项式主题

类别: 共轭先验分布

文件:Binomial Distribution.svg

Binomial probability mass function and normal probability density function approximation for n = 6 and p = 0.5

类别: 指数族分布 Exponential family distributions

This page was moved from wikipedia:en:Binomial distribution. Its edit history can be viewed at 二项分布/edithistory

↑ Feller, W. (1968). An Introduction to Probability Theory and Its Applications (Third ed.). New York: Wiley. p. 151 (theorem in section VI.3). https://archive.org/details/introductiontopr01wfel.
↑ Wadsworth, G. P. (1960). Introduction to Probability and Random Variables. New York: McGraw-Hill. p. 52. https://archive.org/details/introductiontopr0000wads.
↑ Jowett, G. H. (1963). "The Relationship Between the Binomial and F Distributions". Journal of the Royal Statistical Society D. 13 (1): 55–57. doi:10.2307/2986663. JSTOR 2986663.
↑ See also Nicolas, André (January 7, 2019). "Finding mode in Binomial distribution". Stack Exchange.
↑ Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung". Wissenschaftliche Zeitschrift der Technischen Universität Dresden (in German). 19: 29–33.{{cite journal}}: CS1 maint: unrecognized language (link)
↑ Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.
↑ Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica. 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x.
↑ Hamza, K. (1995 D(a\parallel p)=(a)\log\frac{a}{p}+(1-a)\log\frac{1-a}{1-p}. \!). "The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions F(k;n,p) \leq \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right)". Statistics & Probability Letters. 23 where D(a: 21–25. doi:10.1016/0167-7152(94)00090-U. PMC [//www.ncbi.nlm.nih.gov/pmc/articles/PMCAsymptotically%2C%20this%20bound%20is%20reasonably%20tight%3B%20see%20%0A%0A%E4%BB%8E%E6%B8%90%E8%BF%91%E7%9A%84%E8%A7%92%E5%BA%A6%E6%9D%A5%E7%9C%8B%EF%BC%8C%E8%BF%99%E4%B8%AA%E7%95%8C%E9%99%90%E5%8D%81%E5%88%86%E4%B8%A5%E6%A0%BC%3B%20%E5%8F%82%E8%A7%81 Asymptotically, this bound is reasonably tight; see 从渐近的角度来看，这个界限十分严格; 参见]. {{cite journal}}: Check |pmc= value (help); Check date values in: |year= (help); Cite has empty unknown parameters: |1= and |3= (help); Text "p) is the relative entropy between an a-coin and a p-coin (i.e. between the Bernoulli(a) and Bernoulli(p) distribution): 其中D(a" ignored (help); line feed character in |pmc= at position 53 (help); line feed character in |title= at position 123 (help); line feed character in |volume= at position 3 (help); line feed character in |year= at position 5 (help)CS1 maint: extra punctuation (link)
↑ ^9.0 ^9.1 Arratia, R.; Gordon, L. (1989). "Tutorial on large deviations for the binomial distribution". Bulletin of Mathematical Biology. 51 (1): 125–131. doi:10.1007/BF02458840. PMID 2706397. S2CID 189884382.
↑ Robert B. Ash (1990). Information Theory. Dover Publications. p. 115. https://archive.org/details/informationtheor00ashr.
↑ Razzaghi, Mehdi (2002). "On the estimation of binomial success probability with zero occurrence in sample". Journal of Modern Applied Statistical Methods. 1 (2): 326–332. doi:10.22237/jmasm/1036110000.
↑ ^12.0 ^12.1 Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001), "Interval Estimation for a Binomial Proportion", Statistical Science, 16 (2): 101–133, CiteSeerX 10.1.1.323.7752, doi:10.1214/ss/1009213286, retrieved 2015-01-05
↑ Agresti, Alan; Coull, Brent A. (May 1998), "Approximate is better than 'exact' for interval estimation of binomial proportions" (PDF), The American Statistician, 52 (2): 119–126, doi:10.2307/2685469, JSTOR 2685469, retrieved 2015-01-05
↑ Pires, M. A. (2002). "Confidence intervals for a binomial proportion: comparison of methods and software evaluation". In Klinke, S.; Ahrend, P.; Richter, L.. Proceedings of the Conference CompStat 2002. Short Communications and Posters. https://www.math.tecnico.ulisboa.pt/~apires/PDFs/AP_COMPSTAT02.pdf.
↑ ^15.0 ^15.1 Wilson, Edwin B. (June 1927), "Probable inference, the law of succession, and statistical inference" (PDF), Journal of the American Statistical Association, 22 (158): 209–212, doi:10.2307/2276774, JSTOR 2276774, archived from the original (PDF) on 2015-01-13, retrieved 2015-01-05
↑ {{cite book [math]\displaystyle{ &= \binom{n}{m} (pq)^m \left( \sum_{k=m}^n \binom{n-m}{k-m} \left(p(1-q)\right)^{k-m} (1-p)^{n-k} \right) }[/math] | chapter =Confidence intervals | chapter-url =http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm After substituting i = k - m in the expression above, we get | title =Engineering Statistics Handbook \Pr[Y = m] = \binom{n}{m} (pq)^m \left( \sum_{i=0}^{n-m} \binom{n-m}{i} (p - pq)^i (1-p)^{n-m - i} \right) | publisher =NIST/Sematech Notice that the sum (in the parentheses) above equals (p - pq + 1 - p)^{n-m} by the binomial theorem. Substituting this in finally yields | year =2012 1.1.1.2.2.2.2.2.2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.3 | access-date =2017-07-23 [math]\displaystyle{ \Pr[Y=m] &= \binom{n}{m} (pq)^m (p - pq + 1 - p)^{n-m}\\[4pt] }[/math] }}
↑ ^17.0 ^17.1 Katz, D.; Baptista, J.; Azen, S. P.; Pike, M. C. (1978). "Obtaining confidence intervals for the risk ratio in cohort studies". Biometrics. 34 (3): 469–474. doi:10.2307/2530610. JSTOR 2530610. {{cite journal}}: Unknown parameter |displayauthors= ignored (help)
↑ Taboga, Marco. "Lectures on Probability Theory and Mathematical Statistics". statlect.com. Retrieved 18 December 2017.
↑ Wang, Y. H. (1993). "On the number of successes in independent trials" (PDF). Statistica Sinica. 3 (2): 295–312. Archived from the original (PDF) on 2016-03-03.

[1] Feller, W. (1968). An Introduction to Probability Theory and Its Applications (Third ed.). New York: Wiley. p. 151 (theorem in section VI.3). https://archive.org/details/introductiontopr01wfel.

[2] Wadsworth, G. P. (1960). Introduction to Probability and Random Variables. New York: McGraw-Hill. p. 52. https://archive.org/details/introductiontopr0000wads.

[3] Jowett, G. H. (1963). "The Relationship Between the Binomial and F Distributions". Journal of the Royal Statistical Society D. 13 (1): 55–57. doi:10.2307/2986663. JSTOR 2986663.

[4] See also Nicolas, André (January 7, 2019). "Finding mode in Binomial distribution". Stack Exchange.

[5] Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung". Wissenschaftliche Zeitschrift der Technischen Universität Dresden (in German). 19: 29–33.{{cite journal}}: CS1 maint: unrecognized language (link)

[6] Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.

[KaasBuhrman-7] Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica. 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x.

[Hamza-8] Hamza, K. (1995 D(a\parallel p)=(a)\log\frac{a}{p}+(1-a)\log\frac{1-a}{1-p}. \!). "The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions F(k;n,p) \leq \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right)". Statistics & Probability Letters. 23 where D(a: 21–25. doi:10.1016/0167-7152(94)00090-U. PMC [//www.ncbi.nlm.nih.gov/pmc/articles/PMCAsymptotically%2C%20this%20bound%20is%20reasonably%20tight%3B%20see%20%0A%0A%E4%BB%8E%E6%B8%90%E8%BF%91%E7%9A%84%E8%A7%92%E5%BA%A6%E6%9D%A5%E7%9C%8B%EF%BC%8C%E8%BF%99%E4%B8%AA%E7%95%8C%E9%99%90%E5%8D%81%E5%88%86%E4%B8%A5%E6%A0%BC%3B%20%E5%8F%82%E8%A7%81 Asymptotically, this bound is reasonably tight; see 从渐近的角度来看，这个界限十分严格; 参见]. {{cite journal}}: Check |pmc= value (help); Check date values in: |year= (help); Cite has empty unknown parameters: |1= and |3= (help); Text "p) is the relative entropy between an a-coin and a p-coin (i.e. between the Bernoulli(a) and Bernoulli(p) distribution): 其中D(a" ignored (help); line feed character in |pmc= at position 53 (help); line feed character in |title= at position 123 (help); line feed character in |volume= at position 3 (help); line feed character in |year= at position 5 (help)CS1 maint: extra punctuation (link)

[ag-9] 9.0 ^9.1 Arratia, R.; Gordon, L. (1989). "Tutorial on large deviations for the binomial distribution". Bulletin of Mathematical Biology. 51 (1): 125–131. doi:10.1007/BF02458840. PMID 2706397. S2CID 189884382.

[10] Robert B. Ash (1990). Information Theory. Dover Publications. p. 115. https://archive.org/details/informationtheor00ashr.

[11] Razzaghi, Mehdi (2002). "On the estimation of binomial success probability with zero occurrence in sample". Journal of Modern Applied Statistical Methods. 1 (2): 326–332. doi:10.22237/jmasm/1036110000.

[Brown2001-12] 12.0 ^12.1 Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001), "Interval Estimation for a Binomial Proportion", Statistical Science, 16 (2): 101–133, CiteSeerX 10.1.1.323.7752, doi:10.1214/ss/1009213286, retrieved 2015-01-05

[Agresti1988-13] Agresti, Alan; Coull, Brent A. (May 1998), "Approximate is better than 'exact' for interval estimation of binomial proportions" (PDF), The American Statistician, 52 (2): 119–126, doi:10.2307/2685469, JSTOR 2685469, retrieved 2015-01-05

[Pires00-14] Pires, M. A. (2002). "Confidence intervals for a binomial proportion: comparison of methods and software evaluation". In Klinke, S.; Ahrend, P.; Richter, L.. Proceedings of the Conference CompStat 2002. Short Communications and Posters. https://www.math.tecnico.ulisboa.pt/~apires/PDFs/AP_COMPSTAT02.pdf.

[Wilson1927-15] 15.0 ^15.1 Wilson, Edwin B. (June 1927), "Probable inference, the law of succession, and statistical inference" (PDF), Journal of the American Statistical Association, 22 (158): 209–212, doi:10.2307/2276774, JSTOR 2276774, archived from the original (PDF) on 2015-01-13, retrieved 2015-01-05

[16] {{cite book [math]\displaystyle{ &= \binom{n}{m} (pq)^m \left( \sum_{k=m}^n \binom{n-m}{k-m} \left(p(1-q)\right)^{k-m} (1-p)^{n-k} \right) }[/math] | chapter =Confidence intervals | chapter-url =http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm After substituting i = k - m in the expression above, we get | title =Engineering Statistics Handbook \Pr[Y = m] = \binom{n}{m} (pq)^m \left( \sum_{i=0}^{n-m} \binom{n-m}{i} (p - pq)^i (1-p)^{n-m - i} \right) | publisher =NIST/Sematech Notice that the sum (in the parentheses) above equals (p - pq + 1 - p)^{n-m} by the binomial theorem. Substituting this in finally yields | year =2012 1.1.1.2.2.2.2.2.2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.3 | access-date =2017-07-23 [math]\displaystyle{ \Pr[Y=m] &= \binom{n}{m} (pq)^m (p - pq + 1 - p)^{n-m}\\[4pt] }[/math] }}

[Katz1978-17] 17.0 ^17.1 Katz, D.; Baptista, J.; Azen, S. P.; Pike, M. C. (1978). "Obtaining confidence intervals for the risk ratio in cohort studies". Biometrics. 34 (3): 469–474. doi:10.2307/2530610. JSTOR 2530610. {{cite journal}}: Unknown parameter |displayauthors= ignored (help)

[18] Taboga, Marco. "Lectures on Probability Theory and Mathematical Statistics". statlect.com. Retrieved 18 December 2017.

[19] Wang, Y. H. (1993). "On the number of successes in independent trials" (PDF). Statistica Sinica. 3 (2): 295–312. Archived from the original (PDF) on 2016-03-03.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

@@ 第467行： / 第467行： @@
 如果X&nbsp;~&nbsp;B(n,&nbsp;p)和Y&nbsp;|&nbsp;X&nbsp;~&nbsp;B(X,&nbsp;q) (给定Y的条件分布&nbsp;X) ，则Y是服从Y&nbsp;~&nbsp;B(n,&nbsp;pq)的简单二项随机变量。
-{{Main|Binomial proportion confidence interval#Wilson score interval}}
@@ 第620行： / 第619行： @@
 关于泊松近似的准确性，参见 Novak，ch.4，及其中的参考资料。
-{{hidden begin|style=width:60%|ta1=center|border=1px #aaa solid|title=[Proof]}}
 由于<math> X \sim B(n, p) </math>和<math> Y \sim B(X, q) </math>，由全概率公式，
@@ 第658行： / 第653行： @@
 请注意，上述的和（括号内）等于<math> (p - pq + 1 - p)^{n-m} </math>由<font color="#ff8000">二项式定理 binomial theorem</font>得出。将此代入最终得到
-{{hidden end}}
 ===伯努利分布===