更改

KS检验 (查看源代码)

2020年9月27日 (日) 11:42的版本

删除268字节、 2020年9月27日 (日) 11:42

→‎Kolmogorov–Smirnov statistic Kolmogorov-Smirnov统计

第42行：第42行：

The [[empirical distribution function]] ''F''<sub>''n''</sub> for ''n'' [[Independent and identically distributed random variables|independent and identically distributed]] (i.i.d.) ordered observations ''X<sub>i</sub>'' is defined as

+

n个独立且均匀分布（i.i.d.）的有序观测值Xi的经验分布函数Fn定义为：

+

<math>F_n(x)={1 \over n}\sum_{i=1}^n I_{[-\infty,x]}(X_i)</math>

where I_{[-\infty,x]}(X_i) is the indicator function, equal to 1 if X_i \le x and equal to 0 otherwise.

第47行：第51行：

The Kolmogorov–Smirnov statistic for a given cumulative distribution function F(x) is

−

~~:<math>F_n(x)=~~{1 \~~over n~~}~~\sum_~~{i=1}^n I_{[-\infty,x]}(~~X_i~~)~~</math>~~

+

其中 {\displaystyle I_{[-\infty ,x]}(X_{i})}I_{[-\infty ,x]}(X_{i})是指标函数，如果 {\displaystyle X_{i}\leq x}X_{i}\leq x等于1，否则等于0。

+

给定累积分布函数F（x）的Kolmogorov–Smirnov统计量为：

D_n= \sup_x |F_n(x)-F(x)|

−

~~D _ n = sup _ x | f _ n~~ (x)-f (x) |

+

where supx is the supremum of the set of distances. By the Glivenko–Cantelli theorem, if the sample comes from distribution F(x), then Dn converges to 0 almost surely in the limit when n goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see Kolmogorov distribution). Donsker's theorem provides a yet stronger result.

−

~~where <math>I_{[~~-~~\infty,x]}(X_i)</math> is the [[indicator function]], equal to 1 if <math>X_i \le x</math> and equal to 0 otherwise.~~

+

其中supx是距离集的最大值。根据Glivenko-Cantelli定理，如果样本来自分布F（x），则当n变为无穷大时，Dn几乎肯定会收敛于0。Kolmogorov通过有效加入收敛速率来增强此结果（请参阅Kolmogorov分布）。另外Donsker定理提供了更强的结果。

−

where supx is the supremum of the set of distances. By the Glivenko–Cantelli theorem, if the sample comes from distribution F(x), then Dn converges to 0 almost surely in the limit when n goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see Kolmogorov distribution). Donsker's theorem provides a yet stronger result.

In practice, the statistic requires a relatively large number of data points (in comparison to other goodness of fit criteria such as the Anderson–Darling test statistic) to properly reject the null hypothesis.

−

~~n个独立且均匀分布（i.i.d.）的有序观测值Xi的经验分布函数Fn定义为：~~

−

~~F_{n}(x)={1 \over n}\sum _{i=1}^{n}I_{[-\infty ,x]}(X_{i})~~

−

~~其中 {\displaystyle I_{[-\infty ,x]}(X_{i})}I_{[-\infty ,x]}(X_{i})是指标函数，如果 {\displaystyle X_{i}\leq x}X_{i}\leq x等于1，否则等于0。~~

−

~~给定累积分布函数F（x）的Kolmogorov–Smirnov统计量为：~~

−

~~D_{n}=\sup _{x}|F_{n}(x)-F(x)|~~

−

其中supx是距离集的最大值。根据Glivenko-Cantelli定理，如果样本来自分布F（x），则当n变为无穷大时，Dn几乎肯定会收敛于0。Kolmogorov通过有效加入收敛速率来增强此结果（请参阅Kolmogorov分布）。另外Donsker定理提供了更强的结果。

在实践中，该统计需要相对大量的数据点（与其他拟合优度标准相比，例如Anderson-Darling检验统计）才能正确地拒绝原假设。

Jie

961

个编辑