常用的初始化方法包括Forgy方法和随机分区方法。<ref name="hamerly4">{{Cite conference |last1=Hamerly |first1=Greg |last2=Elkan |first2=Charles |year=2002 |title=Alternatives to the ''k''-means algorithm that find better clusterings |url=http://people.csail.mit.edu/tieu/notebook/kmeans/15_p600-hamerly.pdf |booktitle=Proceedings of the eleventh international conference on Information and knowledge management (CIKM) }}</ref>Forgy方法从数据集中随机选择k个观测值,并将其用作初始均值。随机分区方法首先为每个观测值随机分配一个聚类,然后进入更新步骤,从而计算初始均值作为聚类的随机分配点的质心。Forgy方法趋向于散布初始均值,而随机分区将所有均值都靠近数据集的中心。根据Hamerly等人的观点,<ref name="hamerly4"></ref>对于诸如[[K调和均值 K-harmonic means]]和[[模糊k均值 fuzzy k-means]]的算法,通常首选随机分配方法。为了期望最大化和标准如果采用k-means算法,则最好使用Forgy初始化方法。<ref>{{cite journal |last1=Celebi |first1=M. E. |last2=Kingravi |first2=H. A. |last3=Vela |first3=P. A. |year=2013 |title=A comparative study of efficient initialization methods for the ''k''-means clustering algorithm |journal=[[Expert Systems with Applications]] |volume=40 |issue=1 |pages=200–210 |arxiv=1209.1960 |doi=10.1016/j.eswa.2012.07.021 }}</ref> 然而,Celebi等人的一项综合研究<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>发现,流行的初始化方法(例如:Forgy,Random Partition和Maximin)通常效果较差,而Bradley和Fayyad提出的方法<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>在表现优秀,k-means++表现一般。
+
常用的初始化方法包括Forgy方法和随机分区方法。<ref name="hamerly4">{{Cite conference |last1=Hamerly |first1=Greg |last2=Elkan |first2=Charles |year=2002 |title=Alternatives to the ''k''-means algorithm that find better clusterings |url=http://people.csail.mit.edu/tieu/notebook/kmeans/15_p600-hamerly.pdf |booktitle=Proceedings of the eleventh international conference on Information and knowledge management (CIKM) }}</ref>Forgy方法从数据集中随机选择k个观测值,并将其用作初始均值。随机分区方法首先为每个观测值随机分配一个聚类,然后进入更新步骤,从而计算初始均值作为聚类的随机分配点的质心。Forgy方法趋向于散布初始均值,而随机分区将所有均值都靠近数据集的中心。根据Hamerly等人的观点,<ref name="hamerly4"></ref>对于诸如[[K调和均值 K-harmonic means]]和[[模糊k均值 fuzzy k-means]]的算法,通常首选随机分配方法。为了期望最大化和标准如果采用k-means算法,则最好使用Forgy初始化方法。<ref>{{cite journal |last1=Celebi |first1=M. E. |last2=Kingravi |first2=H. A. |last3=Vela |first3=P. A. |year=2013 |title=A comparative study of efficient initialization methods for the ''k''-means clustering algorithm |journal=Expert Systems with Applications|volume=40 |issue=1 |pages=200–210 |arxiv=1209.1960 |doi=10.1016/j.eswa.2012.07.021 }}</ref> 然而,Celebi等人的一项综合研究<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>发现,流行的初始化方法(例如:Forgy,Random Partition和Maximin)通常效果较差,而Bradley和Fayyad提出的方法<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>在表现优秀,k-means++表现一般。