更改

跳到导航 跳到搜索
第106行: 第106行:     
'''初始化方法'''
 
'''初始化方法'''
常用的初始化方法包括Forgy方法和随机分区方法。<ref name="hamerly4">{{Cite conference |last1=Hamerly |first1=Greg |last2=Elkan |first2=Charles |year=2002 |title=Alternatives to the ''k''-means algorithm that find better clusterings |url=http://people.csail.mit.edu/tieu/notebook/kmeans/15_p600-hamerly.pdf |booktitle=Proceedings of the eleventh international conference on Information and knowledge management (CIKM) }}</ref>Forgy方法从数据集中随机选择k个观测值,并将其用作初始均值。随机分区方法首先为每个观测值随机分配一个聚类,然后进入更新步骤,从而计算初始均值作为聚类的随机分配点的质心。Forgy方法趋向于散布初始均值,而随机分区将所有均值都靠近数据集的中心。根据Hamerly等人的观点,<ref name="hamerly4"></ref>对于诸如[[K调和均值 K-harmonic means]]和[[模糊k均值 fuzzy k-means]]的算法,通常首选随机分配方法。为了期望最大化和标准如果采用k-means算法,则最好使用Forgy初始化方法。<ref>{{cite journal |last1=Celebi |first1=M. E. |last2=Kingravi |first2=H. A. |last3=Vela |first3=P. A. |year=2013 |title=A comparative study of efficient initialization methods for the ''k''-means clustering algorithm |journal=[[Expert Systems with Applications]] |volume=40 |issue=1 |pages=200&ndash;210 |arxiv=1209.1960 |doi=10.1016/j.eswa.2012.07.021 }}</ref> 然而,Celebi等人的一项综合研究<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>发现,流行的初始化方法(例如:Forgy,Random Partition和Maximin)通常效果较差,而Bradley和Fayyad提出的方法<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>在表现优秀,k-means++表现一般。
+
常用的初始化方法包括Forgy方法和随机分区方法。<ref name="hamerly4">{{Cite conference |last1=Hamerly |first1=Greg |last2=Elkan |first2=Charles |year=2002 |title=Alternatives to the ''k''-means algorithm that find better clusterings |url=http://people.csail.mit.edu/tieu/notebook/kmeans/15_p600-hamerly.pdf |booktitle=Proceedings of the eleventh international conference on Information and knowledge management (CIKM) }}</ref>Forgy方法从数据集中随机选择k个观测值,并将其用作初始均值。随机分区方法首先为每个观测值随机分配一个聚类,然后进入更新步骤,从而计算初始均值作为聚类的随机分配点的质心。Forgy方法趋向于散布初始均值,而随机分区将所有均值都靠近数据集的中心。根据Hamerly等人的观点,<ref name="hamerly4"></ref>对于诸如[[K调和均值 K-harmonic means]]和[[模糊k均值 fuzzy k-means]]的算法,通常首选随机分配方法。为了期望最大化和标准如果采用k-means算法,则最好使用Forgy初始化方法。<ref>{{cite journal |last1=Celebi |first1=M. E. |last2=Kingravi |first2=H. A. |last3=Vela |first3=P. A. |year=2013 |title=A comparative study of efficient initialization methods for the ''k''-means clustering algorithm |journal=Expert Systems with Applications|volume=40 |issue=1 |pages=200&ndash;210 |arxiv=1209.1960 |doi=10.1016/j.eswa.2012.07.021 }}</ref> 然而,Celebi等人的一项综合研究<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>发现,流行的初始化方法(例如:Forgy,Random Partition和Maximin)通常效果较差,而Bradley和Fayyad提出的方法<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>在表现优秀,k-means++表现一般。
    
<gallery class="center" widths="300px" heights="300px" caption="标准算法演示">
 
<gallery class="center" widths="300px" heights="300px" caption="标准算法演示">
第116行: 第116行:       −
该算法不能保证收敛到全局最优,其结果可能取决于初始簇。由于该算法通常速度很快,因此通常会在不同的起始条件下多次运行该算法。然而,最坏情况下的性能可能会很慢:特别是某些点集,甚至在两个维度,会聚指数时间,即{{math|2<sup>Ω(<var>n</var>)</sup>}}。<ref>{{cite journal |last=Vattani |first=A. |year=2011 |title=k-means requires exponentially many iterations even in the plane |url=http://cseweb.ucsd.edu/users/avattani/papers/kmeans-journal.pdf |journal=[[Discrete and Computational Geometry]] |volume=45 |issue=4 |pages=596&ndash;616 |doi=10.1007/s00454-011-9340-1 }}</ref> 这些点集似乎在实践中似乎没有出现:k-means的平滑运行时间是多项式,这一事实得到了证实。<ref name="Arthur, David; Manthey, B.; Roeglin, H. 20092">{{cite conference |last1=Arthur |first1=David |last2=Manthey |first2=B. |last3=Roeglin |first3=H. |year=2009 |title=k-means has polynomial smoothed complexity |booktitle=Proceedings of the 50th Symposium on Foundations of Computer Science (FOCS) |arxiv=0904.1113 }}</ref>
+
该算法不能保证收敛到全局最优,其结果可能取决于初始簇。由于该算法通常速度很快,因此通常会在不同的起始条件下多次运行该算法。然而,最坏情况下的性能可能会很慢:特别是某些点集,甚至在两个维度,会聚指数时间,即{{math|2<sup>Ω(<var>n</var>)</sup>}}。<ref>{{cite journal |last=Vattani |first=A. |year=2011 |title=k-means requires exponentially many iterations even in the plane |url=http://cseweb.ucsd.edu/users/avattani/papers/kmeans-journal.pdf |journal=Discrete and Computational Geometry |volume=45 |issue=4 |pages=596&ndash;616 |doi=10.1007/s00454-011-9340-1 }}</ref> 这些点集似乎在实践中似乎没有出现:k-means的平滑运行时间是多项式,这一事实得到了证实。<ref name="Arthur, David; Manthey, B.; Roeglin, H. 20092">{{cite conference |last1=Arthur |first1=David |last2=Manthey |first2=B. |last3=Roeglin |first3=H. |year=2009 |title=k-means has polynomial smoothed complexity |booktitle=Proceedings of the 50th Symposium on Foundations of Computer Science (FOCS) |arxiv=0904.1113 }}</ref>
    
“分配”步骤被称为“期望步骤”,而“更新步骤”是最大化步骤,这使得该算法成为广义期望最大化算法的变体。
 
“分配”步骤被称为“期望步骤”,而“更新步骤”是最大化步骤,这使得该算法成为广义期望最大化算法的变体。
7,129

个编辑

导航菜单