二项分布

参数	[math]\displaystyle{ n \in \{0, 1, 2, \ldots\} }[/math] – --- 试验次数； [math]\displaystyle{ p \in [0,1] }[/math] – -- 每个试验的成功概率； [math]\displaystyle{ q = 1 - p }[/math]
支持	[math]\displaystyle{ k \in \{0, 1, \ldots, n\} }[/math] – --- 成功的数量
概率质量函数	[math]\displaystyle{ \binom{n}{k} p^k q^{n-k} }[/math]
‘’‘累积分布函数 ‘’‘	[math]\displaystyle{ I_{q}(n - k, 1 + k) }[/math]
‘’‘平均值’’‘	[math]\displaystyle{ np }[/math]
‘’‘中位数’’‘	[math]\displaystyle{ \lfloor np \rfloor }[/math] 或 [math]\displaystyle{ \lceil np \rceil }[/math]
‘’‘模’’‘	[math]\displaystyle{ \lfloor (n + 1)p \rfloor }[/math] 或 [math]\displaystyle{ \lceil (n + 1)p \rceil - 1 }[/math]
‘’‘方差’’‘	[math]\displaystyle{ npq }[/math]
‘’‘偏度’’‘	[math]\displaystyle{ \frac{q-p}{\sqrt{npq}} }[/math]
‘’‘峰度’’‘	[math]\displaystyle{ \frac{1-6pq}{npq} }[/math]
‘’‘熵’’‘	[math]\displaystyle{ \frac{1}{2} \log_2 (2\pi enpq) + O \left( \frac{1}{n} \right) }[/math]
‘’‘矩量母函数’’‘	[math]\displaystyle{ (q + pe^t)^n }[/math]
‘’‘特征函数’’‘ =	[math]\displaystyle{ (q + pe^{it})^n }[/math]
‘’‘概率母函数’’‘	[math]\displaystyle{ G(z) = [q + pz]^n }[/math]
‘’‘费雪信息量’’‘	[math]\displaystyle{ g_n(p) = \frac{n}{pq} }[/math] (对于固定的 [math]\displaystyle{ n }[/math])

与n和k相关的二项分布。一个8层(n = 8)的高尔顿盒子中的一个球最终进入中央箱子(k = 4)的概率是 [math]\displaystyle{ 70/256 }[/math].

在概率论和统计学中，参数为n和p的二项分布是n个独立实验序列中成功次数的’’‘离散概率分布 discrete probability distribution ‘’‘，每个实验结果是一个是/否问题，每个实验都有布尔值结果: 成功/是/正确/1 (概率为 p)或失败/否/错误/0 (概率为 q = 1 − p)。

一个单一的结果为成功或失败的实验也被称为’’‘伯努利试验 Bernoulli trial’’‘或’’‘伯努利实验 Bernoulli experiment ‘’‘，一系列伯努利实验结果被称为’’‘伯努利过程 Bernoulli process ‘’‘; 对于一个单一的实验，即n = 1，这个二项分布是一个’’‘伯努利分布 Bernoulli distribution’’‘。二项分布是’’‘统计显著性 statistical significance ‘’‘的’’‘二项检验 binomial test ‘’‘的基础。

二项分布经常被用来模拟大小为n的样本中的成功数量，这些样本是从大小为N的种群中有放回地抽取的。如果抽样没有把抽取的个体放回总体中，抽样就不是独立的，所以得到的分布是一个’’‘超几何分布 hypergeometric distribution ‘’‘，而不是二项分布。然而，对于N远大于n的情况，二项分布仍然是一个很好的近似，并且被广泛使用。

定义

概率质量函数

一般来说，如果’’‘随机变量 random variable ‘’‘X服从参数n ∈ ℕ且 p ∈ [0,1]的二项分布，记作X ~ B(n, p)。在n个独立的伯努利试验中获得k次成功的概率由概率质量函数给出:

[math]\displaystyle{ f(k,n,p) = \Pr(k;n,p) = \Pr(X = k) = \binom{n}{k}p^k(1-p)^{n-k} }[/math]

对于k = 0, 1, 2, ..., n，其中

[math]\displaystyle{ \binom{n}{k} =\frac{n!}{k!(n-k)!} }[/math]

是’’‘二项式系数 binomial coefficient’’‘，因此有了分布的名字。这个公式可以理解为，K次成功发生在概率为p^k的情况下，n − k次失败发生在概率为(1 − p)^n − k的情况下。然而，k次成功可以发生在n个试验中的任何一个，并且在n个试验序列中有[math]\displaystyle{ \binom{n}{k} }[/math]种k次试验成功的不同分配方法。

在创建二项分布概率的参考表时，通常表中最多填充到n/2的值。这是因为对于k > n/2，概率可以通过它的补来计算。

[math]\displaystyle{ f(k,n,p)=f(n-k,n,1-p). }[/math].

把表达式f(k, n, p)看作k的函数，存在一个k值使它达到最大。这个k 值可以通过计算得到。

[math]\displaystyle{ \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} }[/math]

并且与1相比较。总有一个整数M满足^[1]

[math]\displaystyle{ (n+1)p-1 \leq M \lt (n+1)p. }[/math].

f(k, n, p)对k < M 是单调递增的，对k > M 是单调递减的，但(n + 1)p是整数的情况除外。在这种情况下，有(n + 1)p 和 (n + 1)p −1 两个值使f达到最大。M 是伯努利试验最有可能的结果(也就是说，发生的可能性最大，尽管仍然存在不发生的情况) ，被称为模。

例子

假设抛出一枚’’‘有偏硬币 biased coin ‘’‘时，正面朝上的概率为0.3。在6次抛掷中恰好看到4个正面的概率是

[math]\displaystyle{ f(4,6,0.3) = \binom{6}{4}0.3^4 (1-0.3)^{6-4}= 0.059535. }[/math].

累积分布函数

累积分布函数可以表达为:

[math]\displaystyle{ F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}, }[/math] ,

[math]\displaystyle{ \lfloor k\rfloor }[/math]是k的’’‘向下取整 round down’’‘，即小于或等于k的最大整数。

在’’‘正则化不完全的beta函数 regularized incomplete beta function ‘’‘下，它也可以表示如下: ^[2]

[math]\displaystyle{ \begin{align} F(k;n,p) & = \Pr(X \le k) \\ &= I_{1-p}(n-k, k+1) \\ & = (n-k) {n \choose k} \int_0^{1-p} t^{n-k-1} (1-t)^k \, dt. \end{align} }[/math]

这相当于’’‘F分布 F-distribution’’‘的累积分布函数: ^[3]

[math]\displaystyle{ F(k;n,p) = F_{F\text{-distribution}}\left(x=\frac{1-p}{p}\frac{k+1}{n-k};d_1=2(n-k),d_2=2(k+1)\right). }[/math]

下面给出了累积分布函数的一些’’‘闭式界 closed-form bounds ‘’‘。

属性

期望值和方差

如果X ~ B(n, p)，即X是一个服从二项分布的随机变量，n 是实验的总数，p 是每个实验得到成功结果的概率，那么X的期望值是:

[math]\displaystyle{ \operatorname{E}[X] = np. }[/math]。

这是由于期望值的’’‘线性性 linearity’’‘，以及 $X$ 是 $n$ 个相同的伯努利随机变量的线性组合，每个变量都有期望值 $p$ 。换句话说，如果[math]\displaystyle{ X_1, \ldots, X_n }[/math]是参数 $p$ 的相同的（且独立的）伯努利随机变量，那么

[math]\displaystyle{ X = X_1 + \cdots + X_n }[/math]

[math]\displaystyle{ \operatorname{E}[X] = \operatorname{E}[X_1 + \cdots + X_n] = \operatorname{E}[X_1] + \cdots + \operatorname{E}[X_n] = p + \cdots + p = np. }[/math]

方差是:

[math]\displaystyle{ \operatorname{Var}(X) = np(1 - p). }[/math]

这也是因为独立随机变量和的方差是方差之和。

高阶矩

前6个中心矩由

[math]\displaystyle{ \begin{align} \mu_1 &= 0, \\ \mu_2 &= np(1-p),\\ \mu_3 &= np(1-p)(1-2p),\\ \mu_4 &= np(1-p)(1+(3n-6)p(1-p)),\\ \mu_5 &= np(1-p)(1-2p)(1+(10n-12)p(1-p)),\\ \mu_6 &= np(1-p)(1-30p(1-p)(1-4p(1-p))+5np(1-p)(5-26p(1-p))+15n^2 p^2 (1-p)^2). \end{align} }[/math]

模

通常二项式B(n, p)分布的模等于[math]\displaystyle{ \lfloor (n+1)p\rfloor }[/math]，其中[math]\displaystyle{ \lfloor\cdot\rfloor }[/math]是’’‘向下取整函数 floor function ‘’‘。然而，当(n + 1)p是整数且p不为0或1时，二项分布有两种模: (n + 1)p和(n + 1)p − 1。当p等于0或1时，对应的模为0或n。这些情况可总结如下:

[math]\displaystyle{ \text{mode} = \begin{cases} \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ (n+1)\,p\ \text{ and }\ (n+1)\,p - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1. \end{cases} }[/math]

证明: 让

[math]\displaystyle{ f(k)=\binom nk p^k q^{n-k}. }[/math]

当[math]\displaystyle{ p=0 }[/math]，只有[math]\displaystyle{ f(0) }[/math]有一个非零值，[math]\displaystyle{ f(0)=1 }[/math]。当[math]\displaystyle{ p=1 }[/math]，我们发现当[math]\displaystyle{ k\neq n }[/math]，[math]\displaystyle{ f(n)=1 }[/math]且[math]\displaystyle{ f(k)=0 }[/math]。这证明了[math]\displaystyle{ p=0 }[/math]时模为0，[math]\displaystyle{ p=1 }[/math]时模为[math]\displaystyle{ n }[/math]。

当[math]\displaystyle{ 0 \lt p \lt 1 }[/math]。我们发现

[math]\displaystyle{ \frac{f(k+1)}{f(k)} = \frac{(n-k)p}{(k+1)(1-p)} }[/math].

由此可见

[math]\displaystyle{ \begin{align} k \gt (n+1)p-1 \Rightarrow f(k+1) \lt f(k) \\ k = (n+1)p-1 \Rightarrow f(k+1) = f(k) \\ k \lt (n+1)p-1 \Rightarrow f(k+1) \gt f(k) \end{align} }[/math]

所以当[math]\displaystyle{ (n+1)p-1 }[/math]是一个整数时，[math]\displaystyle{ (n+1)p-1 }[/math]和[math]\displaystyle{ (n+1)p }[/math]是一个模。在[math]\displaystyle{ (n+1)p-1\notin Z }[/math]的情况下，只有[math]\displaystyle{ \lfloor (n+1)p-1\rfloor+1=\lfloor (n+1)p\rfloor }[/math]是模。^[4]

中位数

一般来说，没有单一的公式可以找到一个二项分布的中位数，甚至可能不是唯一的。然而，几个特殊的结果是已经确定的:

如果np是一个整数，那么它的均值，中位数和模相同且等于np。^[5]^[6]

任何中位数m都必须满足⌊np⌋ ≤ m ≤ ⌈np⌉。^[7]

中位数m不能离均值太远。|m − np| ≤ min{ ln 2, max{p, 1 − p}

[math]\displaystyle{ F(k;n,p) \geq \frac{1}{\sqrt{8n\tfrac{k}{n}(1-\tfrac{k}{n})}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right), }[/math]

中位数是唯一的并且等于m = round(np)，此时|m − np| ≤ min{p, 1 − p}（[math]\displaystyle{ ''p'' = {{sfrac|1|2}} }[/math]和 n 是奇数的情况除外）

这意味着更简单但更宽松的界限

[math]\displaystyle{ F(k;n,p) \geq \frac1{\sqrt{2n}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right). }[/math]

对于p = 1/2且n是奇数，任意m满足 (n − 1) ≤ m ≤ (n + 1)是一个二项分布的中位数。如果p = 1/2且n 是偶数，那么m = n/2是唯一的中位数:

[math]\displaystyle{ F(k;n,p) \geq \frac1{\sqrt{2n}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right); }[/math]

当p = 1/2并且n为偶数，k ≥ 3n/8时, 可以使分母为常数。

‘’‘尾部边界’’‘

对于k≤np，可以得出累积分布函数左尾的上界[math]\displaystyle{ F(k;n,p)=Pr(X \le k) }[/math]，即最多存在k次成功的概率。由于[math]\displaystyle{ Pr(X \ge k) = F(n-k;n,1-p) }[/math]，这些界限也可以看作是k≥np的累积分布函数右尾的边界。

[math]\displaystyle{ F(k;n,\tfrac{1}{2}) \geq \frac{1}{15} \exp\left(- 16n \left(\frac{1}{2} -\frac{k}{n}\right)^2\right). \! }[/math]

‘’‘霍夫丁不等式 Hoeffding's inequality ‘’‘得到简单的边界

[math]\displaystyle{ F(k;n,p) \leq \exp\left(-2 n\left(p-\frac{k}{n}\right)^2\right), \! }[/math]

然而，这并不是很严格。特别是，当p=1时，有F(k;n，p) = 0(对于固定的k，n与k < n)，但是Hoeffding的约束评价为一个正的常数。

当 n 已知时，参数 p 可以使用成功的比例来估计:[math]\displaystyle{ \widehat{p} = \frac{x}{n} }[/math]。可以利用’’‘极大似然估计 maximum likelihood estimator ‘’‘和’’‘ 矩方法 method of moments’’‘来求出该估计量。’’‘Lehmann-scheffé 定理’’‘证明了该估计量是无偏的一致的且方差最小的，因为该估计量是基于一个极小’’‘充分完备统计量 sufficient and complete statistic’’‘(即: x).它在概率和’’‘均方误差 MSE’’‘方面也是一致的。

可以从’’‘切尔诺夫界 Chernoff bound’’‘中得到一个更清晰的边界。

利用 Beta分布作为’’‘共轭先验分布 conjugate prior distribution ‘’‘时，也存在p的封闭形式的’’‘贝叶斯估计 Bayes estimator ‘’‘。当使用一个通用[math]\displaystyle{ \operatorname{Beta}(\alpha, \beta) }[/math]作为先验时，后验均值估计量为: [math]\displaystyle{ \widehat{p_b} = \frac{x+\alpha}{n+\alpha+\beta} }[/math]。贝叶斯估计是渐近有效的，当样本容量趋近无穷大(n →∞)时，它趋近极大似然估计解。贝叶斯估计是有偏的(偏多少取决于先验) ，可接受的且一致的概率。

[math]\displaystyle{ F(k;n,p) \leq \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right) }[/math]

对于使用标准均匀分布作为非信息性的先验概率的特殊情况([math]\displaystyle{ \operatorname{Beta}(\alpha=1, \beta=1) = U(0,1) }[/math])，后验均值估计变为[math]\displaystyle{ \widehat{p_b} = \frac{x+1}{n+2} }[/math] (后验模式应只能得出标准估计量)。这种方法被称为’’‘继承法则 the rule of succession ‘’‘，它是18世纪皮埃尔-西蒙·拉普拉斯 Pierre-Simon Laplace提出的。

其中D(a || p)是参数为a和p的相对熵，即Bernoulli(a)和Bernoulli(p)概率分布的差值：

当估计用非常罕见的事件和一个小的n (例如，如果x = 0) ，那么使用标准估计会得到[math]\displaystyle{ \widehat{p} = 0 }[/math]，这有时是不现实的和我们不希望看到的。在这种情况下，有各种可供选择的估计值。一种方法是使用贝叶斯估计，得到:[math]\displaystyle{ \widehat{p_b} = \frac{1}{n+2} }[/math])。另一种方法是利用从3个规则获得的置信区间的上界: [math]\displaystyle{ \widehat{p_{\text{rule of 3}}} = \frac{3}{n}) }[/math]

[math]\displaystyle{ D(a\parallel p)=(a)\log\frac{a}{p}+(1-a)\log\frac{1-a}{1-p}. \! }[/math]

渐近地，这个边界是相当严格的；详见^[8]。

即使对于非常大的 n 值，均值的实际分布是非正态的。针对这一问题，提出了几种估计置信区间的方法。

我们还可以得到尾部[math]\displaystyle{ F(k;n,p) }[/math]的下界，即’’‘反集中界anti-concentration bounds ‘’‘。通过用’’‘斯特林公式 Stirling's formula’’‘对二项式系数进行近似，可以看出：

[math]\displaystyle{ F(k;n,p) \geq \frac{1}{\sqrt{8n\tfrac{k}{n}(1-\tfrac{k}{n})}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right), }[/math]

在下面的置信区间等式中，这些变量具有以下含义:

这意味着更简单但更松散的约束。

[math]\displaystyle{ F(k;n,p) \geq \frac1{\sqrt{2n}} \exp\left(-nD\left(\frac{k}{n}\parallel p\right)\right). }[/math]

当p = 1/2并且n为偶数，k ≥ 3n/8时, 可以使分母为常数

[math]\displaystyle{ F(k;n,\tfrac{1}{2}) \geq \frac{1}{15} \exp\left(- 16n \left(\frac{1}{2} -\frac{k}{n}\right)^2\right). \! }[/math]

[math]\displaystyle{ \widehat{p\,} \pm z \sqrt{ \frac{ \widehat{p\,} ( 1 -\widehat{p\,} )}{ n } } }[/math]

统计推断

可以加上0.5/n 的连续校正。

参数估计

Beta分布贝叶斯推断

当n已知时，参数p可以用成功的比例来估计：[math]\displaystyle{ \widehat{p} = \frac{x}{n}. }[/math]。这个估计是用极大似然估计法和矩估计方法来计算的。这个估计是无偏的、一致的且有最小的方差，由Lehmann-Scheffé定理证明，因为它是基于最小充分完备统计量（即：x）。它的概率和均方误差（MSE）也是一致估计。

[math]\displaystyle{ \tilde{p}\pm z\sqrt{\frac{\tilde{p}(1 - \tilde{p})}{ n + z^2 }} }[/math] .

利用 Beta分布作为共轭先验分布时，也存在p的封闭形式的贝叶斯估计。当使用一个通用[math]\displaystyle{ \operatorname{Beta}(\alpha, \beta) }[/math]作为先验时，后验均值估计量为: [math]\displaystyle{ \widehat{p_b} = \frac{x+\alpha}{n+\alpha+\beta} }[/math]。贝叶斯估计是渐近有效的，当样本容量趋近无穷大(n →∞)时，它趋近极大似然估计（MLE）解。贝叶斯估计是有偏的(偏多少取决于先验) ，可接受的且一致的概率。

这里p的估计被修改为

对于使用标准均匀分布作为非信息性的先验概率的特殊情况([math]\displaystyle{ \operatorname{Beta}(\alpha=1, \beta=1) = U(0,1) }[/math])，后验均值估计变为[math]\displaystyle{ \widehat{p_b} = \frac{x+1}{n+2} }[/math] (后验模式应只能得出标准估计量)。这种方法被称为继承法则，它是18世纪 Pierre-Simon Laplace提出的。

[math]\displaystyle{ \tilde{p}= \frac{ n_1 + \frac{1}{2} z^2}{ n + z^2 } }[/math]

当估计值p时非常罕见，而且很小（例如：如果x=0），那么使用标准估计器会得到[math]\displaystyle{ \widehat{p} = 0, }[/math]，这有时是不现实的，也是不可取的。在这种情况下，有几种不同的可替代的估计方法。^[9]一种方法是使用贝叶斯估计，得到: [math]\displaystyle{ \widehat{p_b} = \frac{1}{n+2} }[/math])。另一种方法是利用从3个规则获得的置信区间的上界: [math]\displaystyle{ \widehat{p_{\text{rule of 3}}} = \frac{3}{n} }[/math])

值信区间

[math]\displaystyle{ \sin^2 \left(\arcsin \left(\sqrt{\widehat{p\,}}\right) \pm \frac{z}{2\sqrt{n}} \right). }[/math]

即使对于相当大的n值，平均数的实际分布是显著非正态的，^[10]由于这个问题，人们提出了几种估计置信区间的方法。

在下面的置信区间公式中，变量具有以下含义

n₁是n中的成功次数，即试验的总次数。

[math]\displaystyle{ \widehat{p\,} = \frac{n_1}{n} }[/math]是成功的比例。

下列公式中的符号在两个地方不同于以前的公式:

[math]\displaystyle{ z }[/math]是’’‘标准正态分布 standard normal distribution ‘’‘的[math]\displaystyle{ 1 - \tfrac{1}{2}\alpha }[/math]分位数(即概率)对应的目标错误率 [math]\displaystyle{ \alpha }[/math]。例如，95%的’’‘置信度 confidence level ‘’‘的错误率为[math]\displaystyle{ \alpha }[/math] = 0.05，因此 [math]\displaystyle{ 1 - \tfrac{1}{2}\alpha }[/math] = 0.975 并且[math]\displaystyle{ z }[/math] = 1.96.

Wald 法

[math]\displaystyle{ \frac{p}{z^2}{2n}\widehat{p\,}+\frac{z^2}{2n}+z }[/math]

阿格里斯蒂-库尔方法

[math]\displaystyle{ \frac{z^2}{4 n^2} }[/math]^[11]

[math]\displaystyle{ \tilde{p} \pm z \sqrt{ \frac{ \tilde{p} ( 1 - \tilde{p} )}{ n + z^2 } } . }[/math]

[math]\displaystyle{ 1 + \frac{z^2}{n} }[/math]

这里p的估计量被修改为

[math]\displaystyle{ \tilde{p}= \frac{ n_1 + \frac{1}{2} z^2}{ n + z^2 } }[/math]

确切的(克洛佩尔-皮尔森)方法是最保守的。

弧线法

设X ~ B(n,p1)和Y ~ B(m,p2)是独立的。设T = (X/n)/(Y/m)。^[12]

然后log(T)近似服从正态分布，均值为log(p1/p2)和方差为[math]\displaystyle{ ((1/p1) − 1)/n + ((1/p2) − 1)/m }[/math]。

[math]\displaystyle{ \sin^2 \left(\arcsin \left(\sqrt{\widehat{p\,}}\right) \pm \frac{z}{2\sqrt{n}} \right). }[/math]

威尔逊法

如果X ~ B(n, p)和Y | X ~ B(X, q) (给定Y的条件分布 X) ，则Y是服从Y ~ B(n, pq)的简单二项随机变量。

例如，想象一下把 n 个球扔到一个篮子UX里，然后把击中的球扔到另一个篮子UY里。如果 p 是击中 UX 的概率，那么X ~ B(n, p)是击中 UX 的球数。如果 q 是击中 UY 的概率，那么击中 UY的球数是Y ~ B(X, q)，那么Y ~ B(n, pq)。

下面的公式中的符号与前面的公式有两个不同之处^[13]

首先，z_x在下式中的解释略有不同：它的普通含义是标准正态分布x-th的分位数，而不是(1 − x)-th分位数的简写。

其次，这个公式没有使用加减法来定义两个界限。相反，我们可以使用[math]\displaystyle{ z = z_{/alpha / 2} }[/math]得到下限，或者使用[math]\displaystyle{ z = z_{1 - \alpha/2} }[/math]得到上限。例如：对于95%的置信度，误差为[math]\displaystyle{ alpha }[/math] = 0.05，所以用[math]\displaystyle{ z = z_{/alpha/2} = z_{0.025} = - 1.96 }[/math]得到下限，用[math]\displaystyle{ z = z_{1 - \alpha/2} = z_{0.975} = 1.96 }[/math]得到上限。

由于X [math]\displaystyle{ \sim B(n, p) }[/math]和Y [math]\displaystyle{ \sim B(X, q) }[/math]，由’’‘全概率公式 the law of total probability ‘’‘,

[math]\displaystyle{ \widehat{p\,} + \frac{z^2}{2n} + z }[/math]

[math]\displaystyle{ \frac{\widehat{p\,}(1 - \widehat{p\,})}{n} }[/math]

由于[math]\displaystyle{ \tbinom{n}{k} \tbinom{k}{m} = \tbinom{n}{m} \tbinom{n-m}{k-m} }[/math]，上述方程可表示为

[math]\displaystyle{ \frac{z^2}{4 n^2} }[/math]

对[math]\displaystyle{ p ^ k = p ^ m p ^ { k-m } }[/math]进行分解，从总和中取出所有不依赖于 k 的项，现在就得到了结果

[math]\displaystyle{ 1 + \frac{z^2}{n} }[/math]

比较

因此[math]\displaystyle{ Y \sim B(n, pq) }[/math]为所需值。

最精确的二项式比例置信区间#Clopper–Pearson区间方法是最保守的。^[10]

Wald法虽然是教科书上普遍推荐的方法，但却是最偏颇的方法。

伯努利分布是二项分布的一个特例，其中n = 1。在符号上，X ~ B(1, p)与X ~ Bernoulli(p)具有相同的意义。反之，任何二项分布B(n, p)是 n 个伯努利试验和的分布，每个试验的概率 p 相同。

编者推荐

知乎文章

本条目由南风翻译，Smile审校，不是海绵宝宝编辑，如有问题，欢迎在讨论页面进行讨论。

本词条内容源自wikipedia及公开资料，遵守 CC3.0协议。

[1] Feller, W. (1968). An Introduction to Probability Theory and Its Applications (Third ed.). New York: Wiley. p. 151 (theorem in section VI.3). https://archive.org/details/introductiontopr01wfel.

[2] Wadsworth, G. P. (1960). Introduction to Probability and Random Variables. New York: McGraw-Hill. p. 52. https://archive.org/details/introductiontopr0000wads.

[3] Jowett, G. H. (1963). "The Relationship Between the Binomial and F Distributions". Journal of the Royal Statistical Society D. 13 (1): 55–57. doi:10.2307/2986663. JSTOR 2986663.

[4] See also Nicolas, André (January 7, 2019). "Finding mode in Binomial distribution". Stack Exchange.

[5] Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung". Wissenschaftliche Zeitschrift der Technischen Universität Dresden (in German). 19: 29–33.{{cite journal}}: CS1 maint: unrecognized language (link)

[6] Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331-332.

[KaasBuhrman-7] Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica. 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x.

[8] Arratia, R.; Gordon, L. (1989). "Tutorial on large deviations for the binomial distribution". Bulletin of Mathematical Biology. 51 (1): 125–131. doi:10.1007/BF02458840. PMID 2706397. S2CID 189884382.

[9] Razzaghi, Mehdi (2002). "On the estimation of binomial success probability with zero occurrence in sample". Journal of Modern Applied Statistical Methods. 1 (2): 326–332. doi:10.22237/jmasm/1036110000.

[Brown2001-10] 10.0 ^10.1 Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001), "Interval Estimation for a Binomial Proportion", Statistical Science, 16 (2): 101–133, CiteSeerX 10.1.1.323.7752, doi:10.1214/ss/1009213286, retrieved 2015-01-05

[Agresti1988-11] Agresti, Alan; Coull, Brent A. (May 1998), "Approximate is better than 'exact' for interval estimation of binomial proportions" (PDF), The American Statistician, 52 (2): 119–126, doi:10.2307/2685469, JSTOR 2685469, retrieved 2015-01-05

[Pires00-12] Pires, M. A. (2002). "Confidence intervals for a binomial proportion: comparison of methods and software evaluation". In Klinke, S.; Ahrend, P.; Richter, L.. Proceedings of the Conference CompStat 2002. Short Communications and Posters. https://www.math.tecnico.ulisboa.pt/~apires/PDFs/AP_COMPSTAT02.pdf.

[Wilson1927-13] Wilson, Edwin B. (June 1927), "Probable inference, the law of succession, and statistical inference" (PDF), Journal of the American Statistical Association, 22 (158): 209–212, doi:10.2307/2276774, JSTOR 2276774, archived from the original (PDF) on 2015-01-13, retrieved 2015-01-05

[Katz1978-14] 14.0 ^14.1 Katz, D.; Baptista, J.; Azen, S. P.; Pike, M. C. (1978). "Obtaining confidence intervals for the risk ratio in cohort studies". Biometrics. 34 (3): 469–474. doi:10.2307/2530610. JSTOR 2530610. {{cite journal}}: Unknown parameter |displayauthors= ignored (help)

[15] Taboga, Marco. "Lectures on Probability Theory and Mathematical Statistics". statlect.com. Retrieved 18 December 2017.

[16] Wang, Y. H. (1993). "On the number of successes in independent trials" (PDF). Statistica Sinica. 3 (2): 295–312. Archived from the original (PDF) on 2016-03-03.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

二项分布

目录

定义

例子

属性

期望值和方差

高阶矩

模

中位数

‘’‘尾部边界’’‘

统计推断

参数估计

值信区间

Wald 法

阿格里斯蒂-库尔方法

弧线法

威尔逊法

比较

相关分布

二项式之和

两个二项式分布的比值

条件二项式

伯努利分布

泊松二项分布

编者推荐

知乎文章

导航菜单

搜索