更改

跳到导航 跳到搜索
删除5,213字节 、 2020年9月26日 (六) 13:25
第73行: 第73行:  
在实践中,该统计需要相对大量的数据点(与其他拟合优度标准相比,例如Anderson-Darling检验统计)才能正确地拒绝原假设。
 
在实践中,该统计需要相对大量的数据点(与其他拟合优度标准相比,例如Anderson-Darling检验统计)才能正确地拒绝原假设。
   −
==Kolmogorov distribution==
+
==Kolmogorov distribution Kolmogorov 分布==
 
  −
where B(t) is the Brownian bridge. The cumulative distribution function of K is given by
  −
 
  −
其中 b (t)是布朗桥。的累积分布函数是由
      
The Kolmogorov distribution is the distribution of the [[random variable]]
 
The Kolmogorov distribution is the distribution of the [[random variable]]
   −
 
+
where B(t) is the Brownian bridge. The cumulative distribution function of K is given by
    
\operatorname{Pr}(K\leq x)=1-2\sum_{k=1}^\infty (-1)^{k-1} e^{-2k^2 x^2}=\frac{\sqrt{2\pi}}{x}\sum_{k=1}^\infty e^{-(2k-1)^2\pi^2/(8x^2)},
 
\operatorname{Pr}(K\leq x)=1-2\sum_{k=1}^\infty (-1)^{k-1} e^{-2k^2 x^2}=\frac{\sqrt{2\pi}}{x}\sum_{k=1}^\infty e^{-(2k-1)^2\pi^2/(8x^2)},
  −
操作名{ Pr }(k leq x) = 1-2 sum { k = 1} ^ infty (- 1) ^ { k-1} e ^ {-2 k ^ 2 x ^ 2} = frac { sqrt {2 pi }{ x } sum { k = 1} ^ infty e ^ {-(2k-1) ^ 2 pi ^ 2/(8x ^ 2)} ,
      
:<math>K=\sup_{t\in[0,1]}|B(t)|</math>
 
:<math>K=\sup_{t\in[0,1]}|B(t)|</math>
  −
      
which can also be expressed by the Jacobi theta function \vartheta_{01}(z=0;\tau=2ix^2/\pi). Both the form of the Kolmogorov–Smirnov test statistic and its asymptotic distribution under the null hypothesis were published by Andrey Kolmogorov, while a table of the distribution was published by Nikolai Smirnov. Recurrence relations for the distribution of the test statistic in finite samples are available.
 
which can also be expressed by the Jacobi theta function \vartheta_{01}(z=0;\tau=2ix^2/\pi). Both the form of the Kolmogorov–Smirnov test statistic and its asymptotic distribution under the null hypothesis were published by Andrey Kolmogorov, while a table of the distribution was published by Nikolai Smirnov. Recurrence relations for the distribution of the test statistic in finite samples are available.
  −
也可以用 Jacobi theta 函数 vartheta _ {01}(z = 0; tau = 2ix ^ 2/pi)来表示。安德雷·柯尔莫哥洛夫发表了 Kolmogorov-Smirnov 检验统计量的形式及其在无效假设下的渐近分布,Nikolai Smirnov 发表了分布表。有限样本中检验统计量分布的递推关系是可行的。
  −
  −
where ''B''(''t'') is the [[Brownian bridge]]. The [[cumulative distribution function]] of ''K'' is given by<ref>{{Cite journal |vauthors=Marsaglia G, Tsang WW, Wang J |year=2003 |title=Evaluating Kolmogorov's Distribution |journal=Journal of Statistical Software |volume=8 |issue=18 |pages=1–4 |doi=10.18637/jss.v008.i18 |doi-access=free }}</ref>
  −
  −
  −
  −
The goodness-of-fit test or the Kolmogorov–Smirnov test can be constructed by using the critical values of the Kolmogorov distribution. This test is asymptotically valid when n \to\infty. It rejects the null hypothesis at level \alpha if
  −
  −
利用 Kolmogorov 分布的临界值可以构造拟合优度检验或 Kolmogorov-Smirnov 检验。当 n 为无穷时,这个检验是渐近有效的。它在 alpha 级拒绝无效假设,如果
  −
  −
:<math>\operatorname{Pr}(K\leq x)=1-2\sum_{k=1}^\infty (-1)^{k-1} e^{-2k^2 x^2}=\frac{\sqrt{2\pi}}{x}\sum_{k=1}^\infty e^{-(2k-1)^2\pi^2/(8x^2)},</math>
  −
  −
  −
  −
\sqrt{n}D_n>K_\alpha,\,
  −
  −
1,2,2,3,
  −
  −
which can also be expressed by the [[Jacobi theta function]] <math>\vartheta_{01}(z=0;\tau=2ix^2/\pi)</math>. Both the form of the Kolmogorov–Smirnov test statistic and its asymptotic distribution under the null hypothesis were published by [[Andrey Kolmogorov]],<ref name=AK>{{Cite journal |author=Kolmogorov A |year=1933 |title=Sulla determinazione empirica di una legge di distribuzione |journal=G. Ist. Ital. Attuari |volume=4 |pages=83–91}}</ref> while a table of the distribution was published by [[Nikolai Smirnov (mathematician)|Nikolai Smirnov]].<ref>{{Cite journal |author=Smirnov N |year=1948 |title=Table for estimating the goodness of fit of empirical distributions |journal=[[Annals of Mathematical Statistics]] |volume=19 |issue=2 |pages=279–281 |doi=10.1214/aoms/1177730256|doi-access=free }}</ref> Recurrence relations for the distribution of the test statistic in finite samples are available.<ref name=AK/>
  −
  −
  −
  −
where Kα is found from
  −
  −
kα 是从哪里发现的
      
Under null hypothesis that the sample comes from the hypothesized distribution ''F''(''x''),
 
Under null hypothesis that the sample comes from the hypothesized distribution ''F''(''x''),
  −
      
\operatorname{Pr}(K\leq K_\alpha)=1-\alpha.\,
 
\operatorname{Pr}(K\leq K_\alpha)=1-\alpha.\,
  −
操作者名{ Pr }(k leq k _ alpha) = 1-alpha. ,
      
:<math>\sqrt{n}D_n\xrightarrow{n\to\infty}\sup_t |B(F(t))|</math>
 
:<math>\sqrt{n}D_n\xrightarrow{n\to\infty}\sup_t |B(F(t))|</math>
  −
  −
  −
The asymptotic power of this test is 1.
  −
  −
这个检验的渐近幂是1。
      
[[convergence of random variables|in distribution]], where ''B''(''t'') is the [[Brownian bridge]].
 
[[convergence of random variables|in distribution]], where ''B''(''t'') is the [[Brownian bridge]].
  −
  −
  −
Fast and accurate algorithms to compute the cdf \operatorname{Pr}(D_n \leq x) or its complement for arbitrary n and x, are available from:
  −
  −
计算任意 n 和 x 的 cdf 操作数{ Pr }(d _ n leq x)或其补数的快速而准确的算法可从以下方面获得:
      
If ''F'' is continuous then under the null hypothesis <math>\sqrt{n}D_n</math> converges to the Kolmogorov distribution, which does not depend on ''F''. This result may also be known as the Kolmogorov theorem. The accuracy of this limit as an approximation to the exact cdf of <math>K</math> when <math>n</math> is finite is not very impressive: even when <math>n=1000</math>, the corresponding maximum error is about <math>0.9\%</math>; this error increases to <math>2.6\%</math> when <math>n=100</math> and to a totally unacceptable <math>7\%</math> when <math>n=10</math>.  However, a very simple expedient of replacing <math>x</math> by  
 
If ''F'' is continuous then under the null hypothesis <math>\sqrt{n}D_n</math> converges to the Kolmogorov distribution, which does not depend on ''F''. This result may also be known as the Kolmogorov theorem. The accuracy of this limit as an approximation to the exact cdf of <math>K</math> when <math>n</math> is finite is not very impressive: even when <math>n=1000</math>, the corresponding maximum error is about <math>0.9\%</math>; this error increases to <math>2.6\%</math> when <math>n=100</math> and to a totally unacceptable <math>7\%</math> when <math>n=10</math>.  However, a very simple expedient of replacing <math>x</math> by  
  −
      
:<math>x+\frac{1}{6\sqrt{n}}+ \frac{x-1}{4n}</math>
 
:<math>x+\frac{1}{6\sqrt{n}}+ \frac{x-1}{4n}</math>
         
Using estimated parameters, the questions arises which estimation method should be used. Usually this would be the maximum likelihood method, but e.g. for the normal distribution MLE has a large bias error on sigma. Using a moment fit or KS minimization instead has a large impact on the critical values, and also some impact on test power. If we need to decide for Student-T data with df&nbsp;=&nbsp;2 via KS test whether the data could be normal or not, then a ML estimate based on H0 (data is normal, so using the standard deviation for scale) would give much larger KS distance, than a fit with minimum KS. In this case we should reject H0, which is often the case with MLE, because the sample standard deviation might be very large for T-2 data, but with KS minimization we may get still a too low KS to reject&nbsp;H0. In the Student-T case, a modified KS test with KS estimate instead of MLE, makes the KS test indeed slightly worse. However, in other cases, such a modified KS test leads to slightly better test power.
 
Using estimated parameters, the questions arises which estimation method should be used. Usually this would be the maximum likelihood method, but e.g. for the normal distribution MLE has a large bias error on sigma. Using a moment fit or KS minimization instead has a large impact on the critical values, and also some impact on test power. If we need to decide for Student-T data with df&nbsp;=&nbsp;2 via KS test whether the data could be normal or not, then a ML estimate based on H0 (data is normal, so using the standard deviation for scale) would give much larger KS distance, than a fit with minimum KS. In this case we should reject H0, which is often the case with MLE, because the sample standard deviation might be very large for T-2 data, but with KS minimization we may get still a too low KS to reject&nbsp;H0. In the Student-T case, a modified KS test with KS estimate instead of MLE, makes the KS test indeed slightly worse. However, in other cases, such a modified KS test leads to slightly better test power.
  −
利用估计的参数,提出了应采用哪种估计方法的问题。通常这是最大似然法。对于正态分布,极大似然估计对西格玛有很大的偏差。使用矩拟合或 KS 最小化对临界值有较大的影响,对测试功率也有一定的影响。如果我们需要通过 KS 检验来决定 df = 2的 Student-T 数据是否正常,那么基于 h 0的 ML 估计(数据是正常的,因此使用标准差作为标度)将给出更大的 KS 距离,而不是用最小的 KS 拟合。在这种情况下,我们应该拒绝 h 0,这通常是与 MLE 的情况,因为样本标准差可能是非常大的 t 2数据,但与 KS 最小化,我们可能仍然得到一个太低的 KS 拒绝 h 0。在学生 t 的情况下,一个修改的 KS 测试与 KS 估计,而不是 MLE,使 KS 测试确实稍差。然而,在其他情况下,这种修改后的 KS 测试会导致稍微更好的测试能力。
      
in the argument of the Jacobi theta function reduces these errors to  
 
in the argument of the Jacobi theta function reduces these errors to  
   −
<math>0.003\%</math>, <math>0.027\%</math>, and <math>0.27\%</math> respectively; such accuracy would be usually considered more than adequate for all practical applications.<ref>{{Cite journal |vauthors=Vrbik, Jan |year=2018 |title=Small-Sample Corrections to Kolmogorov–Smirnov Test Statistic |journal=Pioneer Journal of Theoretical and Applied Statistics |volume=15 |issue=1–2 |pages=15–23}}</ref>
+
<math>0.003\%</math>, <math>0.027\%</math>, and <math>0.27\%</math> respectively; such accuracy would be usually considered more than adequate for all practical applications.
 
  −
 
      
The ''goodness-of-fit'' test or the Kolmogorov–Smirnov test can be constructed by using the critical values of the Kolmogorov distribution. This test is asymptotically valid when <math>n \to\infty</math>. It rejects the null hypothesis at level <math>\alpha</math> if
 
The ''goodness-of-fit'' test or the Kolmogorov–Smirnov test can be constructed by using the critical values of the Kolmogorov distribution. This test is asymptotically valid when <math>n \to\infty</math>. It rejects the null hypothesis at level <math>\alpha</math> if
  −
Under the assumption that F(x) is non-decreasing and right-continuous, with countable (possibly infinite) number of jumps, the KS test statistic can be expressed as:
  −
  −
假设 f (x)是非递减的,且是右连续的,且具有可数个(可能是无限个)跳跃,KS 检验统计量可以表示为:
  −
  −
  −
  −
:<math>\sqrt{n}D_n>K_\alpha,\,</math>
  −
  −
D_n= \sup_x |F_n(x)-F(x)| = \sup_{0 \leq t \leq 1} |F_n(F^{-1}(t)) - F(F^{-1}(t))|.
  −
  −
D _ n = sup _ x | f _ n (x)-f (x) | = sup _ {0 leq t leq 1} | f _ n (f ^ {-1}(t))-f (f ^ {-1}(t)) | .
  −
  −
      
where ''K''<sub>''α''</sub> is found from
 
where ''K''<sub>''α''</sub> is found from
    
From the right-continuity of F(x), it follows that F(F^{-1}(t)) \geq t and F^{-1}(F(x)) \leq x  and hence, the distribution of D_{n} depends on the null distribution F(x), i.e., is no longer distribution-free as in the continuous case. Therefore, a fast and accurate method has been developed to compute the exact and asymptotic distribution of D_{n} when F(x) is purely discrete or mixed  as part of the dgof package of the R language. Major statistical packages among which SAS PROC NPAR1WAY , Stata ksmirnov  implement the KS test under the assumption that F(x) is continuous, which is more conservative if the null distribution is actually not continuous (see   
 
From the right-continuity of F(x), it follows that F(F^{-1}(t)) \geq t and F^{-1}(F(x)) \leq x  and hence, the distribution of D_{n} depends on the null distribution F(x), i.e., is no longer distribution-free as in the continuous case. Therefore, a fast and accurate method has been developed to compute the exact and asymptotic distribution of D_{n} when F(x) is purely discrete or mixed  as part of the dgof package of the R language. Major statistical packages among which SAS PROC NPAR1WAY , Stata ksmirnov  implement the KS test under the assumption that F(x) is continuous, which is more conservative if the null distribution is actually not continuous (see   
  −
从 f (x)的右连续性出发,得到 f (f ^ {-1}(t)) geq t 和 f ^ {-1}(f (x)) leq x,因此 d _ { n }的分布取决于空分布 f (x) ,也就是说,不再像连续情形那样是无分布的。因此,我们发展了一种快速而精确的方法来计算当 f (x)是完全离散的或作为 r 语言程序包 dgof 的一部分混合时 d _ { n }的精确和渐近分布。其中 SAS PROC NPAR1WAY、 Stata ksmirnov 等主要统计软件包在 f (x)连续的假设下实现了 KS 检验,如果零分布实际上不是连续的(见
  −
  −
  −
  −
:<math>\operatorname{Pr}(K\leq K_\alpha)=1-\alpha.\,</math>
  −
  −
).
  −
  −
).
        第294行: 第213行:     
一般来说
 
一般来说
  −
      
==Two-sample Kolmogorov–Smirnov test==
 
==Two-sample Kolmogorov–Smirnov test==
961

个编辑

导航菜单