更改

K-means聚类 (查看源代码)

2020年4月21日 (二) 23:44的版本

添加247字节、 2020年4月21日 (二) 23:44

第106行：第106行：

'''初始化方法'''

−

常用的初始化方法包括Forgy方法和随机分区方法。<ref name="hamerly4">{{Cite conference |last1=Hamerly |first1=Greg |last2=Elkan |first2=Charles |year=2002 |title=Alternatives to the ''k''-means algorithm that find better clusterings |url=http://people.csail.mit.edu/tieu/notebook/kmeans/15_p600-hamerly.pdf |booktitle=Proceedings of the eleventh international conference on Information and knowledge management (CIKM) }}</ref>Forgy方法从数据集中随机选择k个观测值，并将其用作初始均值。随机分区方法首先为每个观测值随机分配一个聚类，然后进入更新步骤，从而计算初始均值作为聚类的随机分配点的质心。Forgy方法趋向于散布初始均值，而随机分区将所有均值都靠近数据集的中心。根据Hamerly等人的观点，<ref name="hamerly4"></ref>对于诸如[[K调和均值 K-harmonic means]]和[[模糊k均值 fuzzy k-means]]的算法，通常首选随机分配方法。为了期望最大化和标准如果采用k-means算法，则最好使用Forgy初始化方法。<ref>{{cite journal |last1=Celebi |first1=M. E. |last2=Kingravi |first2=H. A. |last3=Vela |first3=P. A. |year=2013 |title=A comparative study of efficient initialization methods for the ''k''-means clustering algorithm |journal=[[Expert Systems with Applications]] |volume=40 |issue=1 |pages=200–210 |arxiv=1209.1960 |doi=10.1016/j.eswa.2012.07.021 }}</ref> 然而，Celebi等人的一项综合研究<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |~~author~~-~~link2~~=Usama ~~Fayyad~~ |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>~~发现，流行的初始化方法（例如：Forgy，Random Partition和Maximin）通常效果较差，而Bradley和Fayyad提出的方法[12]~~在表现优秀，k-means++表现一般。

+

常用的初始化方法包括Forgy方法和随机分区方法。<ref name="hamerly4">{{Cite conference |last1=Hamerly |first1=Greg |last2=Elkan |first2=Charles |year=2002 |title=Alternatives to the ''k''-means algorithm that find better clusterings |url=http://people.csail.mit.edu/tieu/notebook/kmeans/15_p600-hamerly.pdf |booktitle=Proceedings of the eleventh international conference on Information and knowledge management (CIKM) }}</ref>Forgy方法从数据集中随机选择k个观测值，并将其用作初始均值。随机分区方法首先为每个观测值随机分配一个聚类，然后进入更新步骤，从而计算初始均值作为聚类的随机分配点的质心。Forgy方法趋向于散布初始均值，而随机分区将所有均值都靠近数据集的中心。根据Hamerly等人的观点，<ref name="hamerly4"></ref>对于诸如[[K调和均值 K-harmonic means]]和[[模糊k均值 fuzzy k-means]]的算法，通常首选随机分配方法。为了期望最大化和标准如果采用k-means算法，则最好使用Forgy初始化方法。<ref>{{cite journal |last1=Celebi |first1=M. E. |last2=Kingravi |first2=H. A. |last3=Vela |first3=P. A. |year=2013 |title=A comparative study of efficient initialization methods for the ''k''-means clustering algorithm |journal=[[Expert Systems with Applications]] |volume=40 |issue=1 |pages=200–210 |arxiv=1209.1960 |doi=10.1016/j.eswa.2012.07.021 }}</ref> 然而，Celebi等人的一项综合研究<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>发现，流行的初始化方法（例如：Forgy，Random Partition和Maximin）通常效果较差，而Bradley和Fayyad提出的方法<ref>{{Cite conference |last1=Bradley |first1=Paul S. |last2=Fayyad |first2=Usama M. |year=1998 |title=Refining Initial Points for ''k''-Means Clustering |book-title=Proceedings of the Fifteenth International Conference on Machine Learning }}</ref>在表现优秀，k-means++表现一般。

−

File:K_Means_Example_Step_4.svg.png|1. ~~在数据域内（以彩色显示）随机生成k个初始“均值”（在这种情况下，k~~ = 3）

+

File:K_Means_Example_Step_4.svg.png|1. 在数据域内（以彩色显示）随机生成k个初始“均值”（在这种情况下，<math> k = 3 </math>）

File:K_Means_Example_Step_2.svg.png|2. 通过将每个观察值与最近的平均值相关联来创建k个聚类。此处的分区表示通过该方法生成的Voronoi图。

−

File:K_Means_Example_Step_3.svg.png|3. ~~k个簇中每个簇的质心成为新的均值。~~

+

File:K_Means_Example_Step_3.svg.png|3. <math> k </math>个簇中每个簇的质心成为新的均值。

File:K_Means_Example_Step_4.svg.png|4. 重复步骤2和3，直到达到收敛为止。

</gallery>

薄荷

7,129

个编辑

更改

K-means聚类 (查看源代码)

2020年4月21日 (二) 23:44的版本

导航菜单

搜索