更改

跳到导航 跳到搜索
删除1,591字节 、 2020年9月26日 (六) 13:16
无编辑摘要
第1行: 第1行: −
此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。
+
此词条由Jie翻译。
    
{{Short description|Non-parametric statistical test between two distributions}}
 
{{Short description|Non-parametric statistical test between two distributions}}
   −
[[File:KS Example.png|thumb|300px|Illustration of the Kolmogorov–Smirnov statistic. Red line is [[Cumulative distribution function|CDF]], blue line is an [[Empirical distribution function|ECDF]], and the black arrow is the K–S statistic.]]
+
[[文件:KS Example|缩略图||Kolmogorov–Smirnov统计数据的图示。 红线是累积分布函数,蓝线是经验分布函数,黑色箭头是K–S统计量。]]
 
  −
CDF, blue line is an ECDF, and the black arrow is the K–S statistic.]]
  −
 
  −
蓝线代表 ECDF,黑色箭头代表 k-s 统计。]
  −
 
  −
 
      
In [[statistics]], the '''Kolmogorov–Smirnov test''' ('''K–S test''' or '''KS test''') is a [[nonparametric statistics|nonparametric test]] of the equality of continuous (or discontinuous, see [[#Discrete and mixed null distribution|Section 2.2]]), one-dimensional [[probability distribution]]s that can be used to compare a [[random sample|sample]] with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).  It is named after  [[Andrey Kolmogorov]] and [[Nikolai Smirnov (mathematician)|Nikolai Smirnov]].
 
In [[statistics]], the '''Kolmogorov–Smirnov test''' ('''K–S test''' or '''KS test''') is a [[nonparametric statistics|nonparametric test]] of the equality of continuous (or discontinuous, see [[#Discrete and mixed null distribution|Section 2.2]]), one-dimensional [[probability distribution]]s that can be used to compare a [[random sample|sample]] with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).  It is named after  [[Andrey Kolmogorov]] and [[Nikolai Smirnov (mathematician)|Nikolai Smirnov]].
第15行: 第9行:  
In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).  It is named after  Andrey Kolmogorov and Nikolai Smirnov.
 
In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous (or discontinuous, see Section 2.2), one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).  It is named after  Andrey Kolmogorov and Nikolai Smirnov.
   −
在统计学中,Kolmogorov-Smirnov 检验(k-s 检验或 KS 检验)是连续(或不连续,见第2.2节)等式的非参数检验,一维概率分布可用于比较样本与参考概率分布(单样本 k-s 检验) ,或比较两个样本(双样本 k-s 检验)。它以安德雷·柯尔莫哥洛夫和尼古拉斯 · 斯米尔诺夫命名。
+
在统计学中,Kolmogorov–Smirnov检验(K-S检验或KS检验)属于非参数检验,具有一维概率分布的连续(或不连续,请参见第2.2节)均等性,可用于比较一个样本分布与参考概率分布(单一样本K-S检验),或比较两个样本分布(两个样本的K-S检验)。它以Andrey Kolmogorov和Nikolai Smirnov的名字命名。
      第23行: 第17行:  
The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the sample is drawn from the reference distribution (in the one-sample case) or that the samples are drawn from the same distribution (in the two-sample case). In the one-sample case, the distribution considered under the null hypothesis may be continuous (see Section 2), purely discrete or mixed (see Section 2.2). In the two-sample case (see Section 3), the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted.
 
The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the sample is drawn from the reference distribution (in the one-sample case) or that the samples are drawn from the same distribution (in the two-sample case). In the one-sample case, the distribution considered under the null hypothesis may be continuous (see Section 2), purely discrete or mixed (see Section 2.2). In the two-sample case (see Section 3), the distribution considered under the null hypothesis is a continuous distribution but is otherwise unrestricted.
   −
Kolmogorov-Smirnov 统计量化了样本的经验分布函数和参考分布的累积分布函数之间的距离,或者样本的经验分布函数之间的距离。该统计量的零分布是在下述零假设下计算的: 样本是从参考分布中抽取的(在单样本情况下) ,或者样本是从同一分布中抽取的(在双样本情况下)。在单样本情况下,在无效假设下考虑的分布可以是连续的(见第2节) ,纯离散的或混合的(见第2.2节)。在两个样本的情况下(见第3节) ,在零假设下考虑的分布是一个连续的分布,但在其他方面是不受限制的。
+
Kolmogorov-Smirnov统计量化了样本分布的经验分布函数Empirical distribution function与参考分布的累积分布函数Cumulative distribution function之间的距离,或者是两个样本分布的经验分布函数之间的距离。该统计量的零分布Null distribution是基于零假设Null hypothesis(或称原始假设)下计算的,可以从参考分布中抽取样本(在单个样本的情况下),或者从相同分布中抽取样本组(在两个样本的情况下)。属于单样本情况的时候,零假设(原假设)考虑的分布可能是连续的(请参阅第2节),纯离散的或混合的(请参阅第2.2节)。然而在考虑两个样本情况下(请参阅第3节),原假设下的分布仅能确定为连续分布,在其他方面并不受限制。
      第31行: 第25行:  
The two-sample K–S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.
 
The two-sample K–S test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.
   −
双样本 k-s 检验是比较两个样本最常用的非参数方法之一,因为它对两个样本的经验累积分布函数的位置和形状的差异都很敏感。
+
K–S双样本检验是比较两个样本分布最有用,也是最通用的非参数方法之一,因为在对比两个样本时,K-S检验对其经验累积分布函数的位置和形状差异具有一定的敏感性。
         −
The Kolmogorov–Smirnov test can be modified to serve as a [[goodness of fit]] test. In the special case of testing for [[Normal distribution|normality]] of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic (see [[#Test with estimated parameters|Test with estimated parameters]]). Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the [[Shapiro–Wilk test]] or [[Anderson–Darling test]].<ref>{{cite journal
+
The Kolmogorov–Smirnov test can be modified to serve as a [[goodness of fit]] test. In the special case of testing for [[Normal distribution|normality]] of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic (see [[#Test with estimated parameters|Test with estimated parameters]]). Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the [[Shapiro–Wilk test]] or [[Anderson–Darling test]].
    
The Kolmogorov–Smirnov test can be modified to serve as a goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic (see Test with estimated parameters). Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the Shapiro–Wilk test or Anderson–Darling test. However, these other tests have their own disadvantages. For instance the Shapiro–Wilk test is known not to work well in samples with many identical values.
 
The Kolmogorov–Smirnov test can be modified to serve as a goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using these to define the specific reference distribution changes the null distribution of the test statistic (see Test with estimated parameters). Various studies have found that, even in this corrected form, the test is less powerful for testing normality than the Shapiro–Wilk test or Anderson–Darling test. However, these other tests have their own disadvantages. For instance the Shapiro–Wilk test is known not to work well in samples with many identical values.
   −
科尔莫戈罗夫-斯米尔诺夫检验可以修改为适合度检验。在检验分布正态性的特殊情况下,对样本进行标准化处理,并与标准正态分布进行比较。这相当于将参考分布的均值和方差设定为与抽样估计数相等,而且众所周知,利用这些参考分布来定义具体的参考分布会改变检验统计数据的零分布(见带估计参数的检验)。各种各样的研究已经发现,即使在这种修正形式下,测试的正常性也不如夏皮罗-威尔克测试或安德森-达林测试强大。然而,这些其他的测试也有它们自己的缺点。例如,众所周知,Shapiro-Wilk 测试在具有许多相同值的样本中不能很好地工作。
+
Kolmogorov–Smirnov检验经过修改以后可以作为拟合优度检验goodness of fit test。在测试分布正态性的特殊情况下,将样本先标准化再与标准正态分布进行比较。这相当于将参考分布的均值和方差设置为与样本估计值相等。显然,使用这些值和方差来定义特定参考分布会更改检验统计量的零分布(请参阅使用估算参数进行检验)。各种研究发现,即使采用这种校正形式,该测试也不能像Shapiro-Wilk检验或Anderson-Darling检验那样有效地检验正态性。当然,这些其他测试也有其自身的缺点。例如,Shapiro–Wilk检验在具有许多相同值的样本中效果并不好。
 
  −
| first = M. A. | last = Stephens | year = 1974 | title = EDF Statistics for Goodness of Fit and Some Comparisons
  −
 
  −
| journal = Journal of the American Statistical Association
  −
 
  −
| volume = 69 | issue = 347| pages = 730–737 | jstor =2286009
  −
 
  −
The empirical distribution function Fn for n independent and identically distributed (i.i.d.) ordered observations Xi is defined as
  −
 
  −
N 个独立同分布的经验分布函数 Fn有序观测的定义是
  −
 
  −
| doi = 10.2307/2286009
  −
 
  −
}}</ref> However, these other tests have their own disadvantages. For instance the Shapiro–Wilk test is known not to work well in samples with many identical values.
  −
 
  −
F_n(x)={1 \over n}\sum_{i=1}^n I_{[-\infty,x]}(X_i)
  −
 
  −
F _ n (x) = {1 over n } sum _ { i = 1} ^ n i _ {[-infty,x ]}(x _ i)
           −
==Kolmogorov–Smirnov statistic==
+
== Kolmogorov–Smirnov statistic Kolmogorov-Smirnov统计==
    
where I_{[-\infty,x]}(X_i) is the indicator function, equal to 1 if X_i \le x and equal to 0 otherwise.
 
where I_{[-\infty,x]}(X_i) is the indicator function, equal to 1 if X_i \le x and equal to 0 otherwise.
  −
其中 i _ {[-infty,x ]}(x _ i)是指示函数,如果 x _ i le x 等于1,否则等于0。
      
The [[empirical distribution function]] ''F''<sub>''n''</sub> for ''n'' [[Independent and identically distributed random variables|independent and identically distributed]] (i.i.d.) ordered observations ''X<sub>i</sub>'' is defined as
 
The [[empirical distribution function]] ''F''<sub>''n''</sub> for ''n'' [[Independent and identically distributed random variables|independent and identically distributed]] (i.i.d.) ordered observations ''X<sub>i</sub>'' is defined as
  −
      
The Kolmogorov–Smirnov statistic for a given cumulative distribution function F(x) is
 
The Kolmogorov–Smirnov statistic for a given cumulative distribution function F(x) is
  −
给定累积分布函数 f (x)的 Kolmogorov-Smirnov 统计量是
      
:<math>F_n(x)={1 \over n}\sum_{i=1}^n I_{[-\infty,x]}(X_i)</math>
 
:<math>F_n(x)={1 \over n}\sum_{i=1}^n I_{[-\infty,x]}(X_i)</math>
  −
      
D_n= \sup_x |F_n(x)-F(x)|
 
D_n= \sup_x |F_n(x)-F(x)|
第84行: 第52行:     
where <math>I_{[-\infty,x]}(X_i)</math> is the [[indicator function]], equal to 1 if <math>X_i \le x</math> and equal to 0 otherwise.
 
where <math>I_{[-\infty,x]}(X_i)</math> is the [[indicator function]], equal to 1 if <math>X_i \le x</math> and equal to 0 otherwise.
  −
      
where supx is the supremum of the set of distances. By the Glivenko–Cantelli theorem, if the sample comes from distribution F(x), then Dn converges to 0 almost surely in the limit when n goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see Kolmogorov distribution). Donsker's theorem provides a yet stronger result.
 
where supx is the supremum of the set of distances. By the Glivenko–Cantelli theorem, if the sample comes from distribution F(x), then Dn converges to 0 almost surely in the limit when n goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see Kolmogorov distribution). Donsker's theorem provides a yet stronger result.
  −
其中 supx 是距离集的上确界。根据 Glivenko-Cantelli 定理,如果样本来自分布 f (x) ,那么当 n 趋于无穷大时,Dn 几乎肯定收敛于0。科尔莫戈罗夫通过有效地提供这种收敛速度(见科尔莫戈罗夫分布)强化了这一结果。唐斯克定理提供了一个更强有力的结果。
  −
  −
The Kolmogorov–Smirnov [[statistic]] for a given [[cumulative distribution function]] ''F''(''x'') is
  −
  −
      
In practice, the statistic requires a relatively large number of data points (in comparison to other goodness of fit criteria such as the Anderson–Darling test statistic) to properly reject the null hypothesis.
 
In practice, the statistic requires a relatively large number of data points (in comparison to other goodness of fit criteria such as the Anderson–Darling test statistic) to properly reject the null hypothesis.
  −
在实践中,统计需要相对大量的数据点(与其他拟合优度标准相比,如安德森-达林检验统计)才能正确地拒绝零假设。
      
:<math>D_n= \sup_x |F_n(x)-F(x)|</math>
 
:<math>D_n= \sup_x |F_n(x)-F(x)|</math>
  −
      
where sup<sub>''x''</sub> is the [[supremum]] of the set of distances. By the [[Glivenko–Cantelli theorem]], if the sample comes from distribution ''F''(''x''), then ''D''<sub>''n''</sub> converges to 0 [[almost surely]] in the limit when <math>n</math> goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see [[Kolmogorov-Smirnov test#Kolmogorov distribution|Kolmogorov distribution]]). [[Donsker's theorem]] provides a yet stronger result.
 
where sup<sub>''x''</sub> is the [[supremum]] of the set of distances. By the [[Glivenko–Cantelli theorem]], if the sample comes from distribution ''F''(''x''), then ''D''<sub>''n''</sub> converges to 0 [[almost surely]] in the limit when <math>n</math> goes to infinity. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see [[Kolmogorov-Smirnov test#Kolmogorov distribution|Kolmogorov distribution]]). [[Donsker's theorem]] provides a yet stronger result.
    
The Kolmogorov distribution is the distribution of the random variable
 
The Kolmogorov distribution is the distribution of the random variable
  −
柯尔莫哥洛夫分布是随机变量的分布
  −
  −
      
In practice, the statistic requires a relatively large number of data points (in comparison to other goodness of fit criteria such as the [[Anderson–Darling test]] statistic) to properly reject the null hypothesis.
 
In practice, the statistic requires a relatively large number of data points (in comparison to other goodness of fit criteria such as the [[Anderson–Darling test]] statistic) to properly reject the null hypothesis.
961

个编辑

导航菜单