更改

跳到导航 跳到搜索
删除2,336字节 、 2020年9月27日 (日) 11:02
第185行: 第185行:  
3. 《Kolmogorov–Smirnov的有限概率性质和离散数据的相似统计量Bounded Probability Properties of Kolmogorov–Smirnov and Similar Statistics for Discrete Data》
 
3. 《Kolmogorov–Smirnov的有限概率性质和离散数据的相似统计量Bounded Probability Properties of Kolmogorov–Smirnov and Similar Statistics for Discrete Data》
   −
==Two-sample Kolmogorov–Smirnov test==
+
== Two-sample Kolmogorov–Smirnov test 双样本Kolmogorov–Smirnov检验 ==
    
c\left(\alpha\right)=\sqrt{-\ln\left(\tfrac{\alpha}{2}\right)\cdot \tfrac{1}{2}},
 
c\left(\alpha\right)=\sqrt{-\ln\left(\tfrac{\alpha}{2}\right)\cdot \tfrac{1}{2}},
第191行: 第191行:  
C left (alpha right) = sqrt {-ln left (tfrac { alpha }{2} right) cdot tfrac {1}{2} ,
 
C left (alpha right) = sqrt {-ln left (tfrac { alpha }{2} right) cdot tfrac {1}{2} ,
   −
[[File:KS2 Example.png|thumb|300px|Illustration of the two-sample Kolmogorov–Smirnov statistic. Red and blue lines each correspond to an empirical distribution function, and the black arrow is the two-sample KS statistic.]]
+
[[文件:KS2 Example.png|缩略图||关于两个样本的Kolmogorov–Smirnov统计量的图解。 红线和蓝线分别对应于经验分布函数,而黑色箭头是两样本的KS统计量。]]
    +
The Kolmogorov–Smirnov test may also be used to test whether two underlying one-dimensional probability distributions differ. In this case, the Kolmogorov–Smirnov statistic is
    +
<math>D_{n,m}=\sup_x |F_{1,n}(x)-F_{2,m}(x)|,</math>
   −
so that the condition reads
+
where <math>F_{1,n}</math> and <math>F_{2,m}</math> are the [[empirical distribution function]]s of the first and the second sample respectively, and <math>\sup</math> is the [[Infimum and supremum|supremum function]].
   −
这样条件就是
+
For large samples, the null hypothesis is rejected at level <math>\alpha</math> if
   −
The Kolmogorov–Smirnov test may also be used to test whether two underlying one-dimensional probability distributions differ. In this case, the Kolmogorov–Smirnov statistic is
+
:<math>D_{n,m}>c(\alpha)\sqrt{\frac{n + m}{n\cdot m}}.</math>
    +
Where <math>n</math> and <math>m</math> are the sizes of first and second sample respectively. The value of <math>c({\alpha})</math> is given in the table below for the most common levels of <math>\alpha</math>
    +
so that the condition reads
   −
D_{n,m}>\frac{1}{\sqrt{n}}\cdot\sqrt{-\ln\left(\tfrac{\alpha}{2}\right)\cdot \tfrac{1 + \tfrac{n}{m}}{2}}.
+
:<math>D_{n,m}>\frac{1}{\sqrt{n}}\cdot\sqrt{-\ln\left(\tfrac{\alpha}{2}\right)\cdot \tfrac{1 + \tfrac{n}{m}}{2}}.</math>
 
  −
1}{ sqrt { n }} cdot sqrt {-ln left (tfrac { alpha }{2} right) cdot tfrac {1 + tfrac { n }{ m }{2}}.
  −
 
  −
:<math>D_{n,m}=\sup_x |F_{1,n}(x)-F_{2,m}(x)|,</math>
  −
 
         
Here, again, the larger the sample sizes, the more sensitive the minimal bound: For a given ratio of sample sizes (e.g. m=n), the minimal bound scales in the size of either of the samples according to its inverse square root.  
 
Here, again, the larger the sample sizes, the more sensitive the minimal bound: For a given ratio of sample sizes (e.g. m=n), the minimal bound scales in the size of either of the samples according to its inverse square root.  
  −
在这里,样本容量越大,最小界限越敏感: 对于给定的样本容量比例(例如:。M = n) ,根据样本的逆平方根得到样本大小的最小界限尺度。
  −
  −
where <math>F_{1,n}</math> and <math>F_{2,m}</math> are the [[empirical distribution function]]s of the first and the second sample respectively, and <math>\sup</math> is the [[Infimum and supremum|supremum function]].
  −
  −
      
Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution is (e.g. whether it's normal or not normal). Again, tables of critical values have been published. A shortcoming of the Kolmogorov–Smirnov test is that it is not very powerful because it is devised to be sensitive against all possible types of differences between two distribution functions.  and  showed evidence that the Cucconi test, originally proposed for simultaneously comparing location and scale, is much more powerful than the Kolmogorov–Smirnov test when comparing two distribution functions.
 
Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution is (e.g. whether it's normal or not normal). Again, tables of critical values have been published. A shortcoming of the Kolmogorov–Smirnov test is that it is not very powerful because it is devised to be sensitive against all possible types of differences between two distribution functions.  and  showed evidence that the Cucconi test, originally proposed for simultaneously comparing location and scale, is much more powerful than the Kolmogorov–Smirnov test when comparing two distribution functions.
   −
注意,两个样本测试检查两个数据样本是否来自同一分布。这并没有指定公共分布是什么(例如,。是否正常)。同样,已经发布了临界值表。Kolmogorov-Smirnov 检验的一个缺点是它不是很强大,因为它被设计成对两个分布函数之间所有可能的差异类型都很敏感。证据表明,当比较两个分布函数时,最初为同时比较位置和规模而提出的 Cucconi 检验要比 Kolmogorov-Smirnov 检验强大得多。
  −
  −
For large samples, the null hypothesis is rejected at level <math>\alpha</math> if
  −
  −
  −
  −
:<math>D_{n,m}>c(\alpha)\sqrt{\frac{n + m}{n\cdot m}}.</math>
  −
  −
  −
  −
Where <math>n</math> and <math>m</math> are the sizes of first and second sample respectively. The value of <math>c({\alpha})</math> is given in the table below for the most common levels of <math>\alpha</math>
      
While the Kolmogorov–Smirnov test is usually used to test whether a given F(x) is the underlying probability distribution of Fn(x), the procedure may be inverted to give confidence limits on F(x) itself. If one chooses a critical value of the test statistic Dα such that P(Dn&nbsp;>&nbsp;Dα) = α, then a band of width ±Dα around Fn(x) will entirely contain F(x) with probability 1&nbsp;−&nbsp;α.
 
While the Kolmogorov–Smirnov test is usually used to test whether a given F(x) is the underlying probability distribution of Fn(x), the procedure may be inverted to give confidence limits on F(x) itself. If one chooses a critical value of the test statistic Dα such that P(Dn&nbsp;>&nbsp;Dα) = α, then a band of width ±Dα around Fn(x) will entirely contain F(x) with probability 1&nbsp;−&nbsp;α.
  −
虽然 Kolmogorov-Smirnov 检验通常用来检验给定的 f (x)是否是 Fn (x)的基本概率分布,但是这个过程可以被反转来给出 f (x)本身的置信限。如果选择检验统计量 dα 的一个临界值使 p (Dn > dα) = α,则 Fn (x)周围的一条宽度 ± dα 带将完全包含 f (x) ,概率为1-α。
  −
        第245行: 第224行:  
A distribution-free multivariate Kolmogorov–Smirnov goodness of fit test has been proposed by Justel, Peña and Zamar (1997).  The test uses a statistic which is built using Rosenblatt's transformation, and an algorithm is developed to compute it in the bivariate case.  An approximate test that can be easily computed in any dimension is also presented.
 
A distribution-free multivariate Kolmogorov–Smirnov goodness of fit test has been proposed by Justel, Peña and Zamar (1997).  The test uses a statistic which is built using Rosenblatt's transformation, and an algorithm is developed to compute it in the bivariate case.  An approximate test that can be easily computed in any dimension is also presented.
   −
Justel、 Peña 和 Zamar (1997)提出了一种无分布多元 Kolmogorov-Smirnov 拟合优度检验方法。该检验使用了一个基于 Rosenblatt 变换的统计量,并开发了一个二元情况下的统计量计算算法。本文还给出了一个可以方便地计算任意维数的近似试验。
      
| <math>\alpha</math> || 0.20 || 0.15 || 0.10 || 0.05 || 0.025 || 0.01 || 0.005 || 0.001
 
| <math>\alpha</math> || 0.20 || 0.15 || 0.10 || 0.05 || 0.025 || 0.01 || 0.005 || 0.001
第260行: 第238行:     
One approach to generalizing the Kolmogorov–Smirnov statistic to higher dimensions which meets the above concern is to compare the cdfs of the two samples with all possible orderings, and take the largest of the set of resulting K–S statistics.  In d dimensions, there are 2d−1 such orderings.  One such variation is due to Peacock  
 
One approach to generalizing the Kolmogorov–Smirnov statistic to higher dimensions which meets the above concern is to compare the cdfs of the two samples with all possible orderings, and take the largest of the set of resulting K–S statistics.  In d dimensions, there are 2d−1 such orderings.  One such variation is due to Peacock  
  −
将 Kolmogorov-Smirnov 统计量推广到更高维度的一种方法是将两个样本的 cdfs 与所有可能的排序进行比较,并取得最大的 k-s 统计量集合。在 d 维中,有2d-1这样的排列。一个这样的变化是由于孔雀
  −
  −
      
(see also Gosset  
 
(see also Gosset  
  −
(参见戈塞特
      
and in general<ref>Eq. (15) in Section 3.3.1 of Knuth, D.E., The Art of Computer Programming, Volume 2 (Seminumerical Algorithms), 3rd Edition, Addison Wesley, Reading Mass, 1998.</ref> by
 
and in general<ref>Eq. (15) in Section 3.3.1 of Knuth, D.E., The Art of Computer Programming, Volume 2 (Seminumerical Algorithms), 3rd Edition, Addison Wesley, Reading Mass, 1998.</ref> by
    
for a 3D version)
 
for a 3D version)
  −
3 d 版本)
  −
  −
      
and another to Fasano and Franceschini (see Lopes et al. for a comparison and computational details). Critical values for the test statistic can be obtained by simulations, but depend on the dependence structure in the joint distribution.
 
and another to Fasano and Franceschini (see Lopes et al. for a comparison and computational details). Critical values for the test statistic can be obtained by simulations, but depend on the dependence structure in the joint distribution.
  −
另一个是 Fasano 和 Franceschini (见 Lopes 等人。以便进行比较和计算细节)。通过仿真可以得到检验统计量的临界值,但这个临界值取决于联合分布中的相关结构。
      
:<math>c\left(\alpha\right)=\sqrt{-\ln\left(\tfrac{\alpha}{2}\right)\cdot \tfrac{1}{2}},</math>
 
:<math>c\left(\alpha\right)=\sqrt{-\ln\left(\tfrac{\alpha}{2}\right)\cdot \tfrac{1}{2}},</math>
  −
      
In one dimension, the Kolmogorov–Smirnov statistic is identical to the so-called star discrepancy D, so another native KS extension to higher dimensions would be simply to use D also for higher dimensions. Unfortunately, the star discrepancy is hard to calculate in high dimensions.
 
In one dimension, the Kolmogorov–Smirnov statistic is identical to the so-called star discrepancy D, so another native KS extension to higher dimensions would be simply to use D also for higher dimensions. Unfortunately, the star discrepancy is hard to calculate in high dimensions.
   −
在一个维度上,Kolmogorov-Smirnov 统计量与所谓的恒星差异 d 是一致的,所以另一个本地的 KS 对更高维度的扩展将是简单地对更高维度也使用 d。不幸的是,在高维空间中,恒星的差异很难计算。
  −
  −
so that the condition reads
  −
  −
  −
  −
:<math>D_{n,m}>\frac{1}{\sqrt{n}}\cdot\sqrt{-\ln\left(\tfrac{\alpha}{2}\right)\cdot \tfrac{1 + \tfrac{n}{m}}{2}}.</math>
      
The Kolmogorov-Smirnov test (one or two sampled test verifies the equality of distributions) is implemented in many software programs:
 
The Kolmogorov-Smirnov test (one or two sampled test verifies the equality of distributions) is implemented in many software programs:
第306行: 第263行:     
Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution is (e.g. whether it's normal or not normal). Again, tables of critical values have been published. A shortcoming of the Kolmogorov–Smirnov test is that it is not very powerful because it is devised to be sensitive against all possible types of differences between two distribution functions. <ref>{{cite journal |last1=Marozzi |first1=Marco |title=Some Notes on the Location-Scale Cucconi Test |journal=Journal of Nonparametric Statistics |date=2009 |volume=21 |issue=5 |page=629–647 |doi=10.1080/10485250902952435 }}</ref> and <ref>{{cite journal |last1=Marozzi |first1=Marco |title=Nonparametric Simultaneous Tests for Location and Scale Testing: a Comparison of Several Methods |journal=Communications in Statistics – Simulation and Computation |date=2013 |volume=42 |issue=6 |page=1298–1317 |doi=10.1080/03610918.2012.665546 }}</ref> showed evidence that the [[Cucconi test]], originally proposed for simultaneously comparing location and scale, is much more powerful than the Kolmogorov–Smirnov test when comparing two distribution functions.
 
Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution is (e.g. whether it's normal or not normal). Again, tables of critical values have been published. A shortcoming of the Kolmogorov–Smirnov test is that it is not very powerful because it is devised to be sensitive against all possible types of differences between two distribution functions. <ref>{{cite journal |last1=Marozzi |first1=Marco |title=Some Notes on the Location-Scale Cucconi Test |journal=Journal of Nonparametric Statistics |date=2009 |volume=21 |issue=5 |page=629–647 |doi=10.1080/10485250902952435 }}</ref> and <ref>{{cite journal |last1=Marozzi |first1=Marco |title=Nonparametric Simultaneous Tests for Location and Scale Testing: a Comparison of Several Methods |journal=Communications in Statistics – Simulation and Computation |date=2013 |volume=42 |issue=6 |page=1298–1317 |doi=10.1080/03610918.2012.665546 }}</ref> showed evidence that the [[Cucconi test]], originally proposed for simultaneously comparing location and scale, is much more powerful than the Kolmogorov–Smirnov test when comparing two distribution functions.
  −
      
==Setting confidence limits for the shape of a distribution function==
 
==Setting confidence limits for the shape of a distribution function==
961

个编辑

导航菜单