更改

K-means聚类 k-means clustering (查看源代码)

2020年4月22日 (三) 00:08的版本

删除42字节、 2020年4月22日 (三) 00:08

第84行：第84行：

最常见的算法使用迭代优化技术。由于它的普遍性，它通常被称为“ k-means算法”,也被称为[[劳埃德算法 Lloyd's algorithm]]，特别是在计算机科学界。有时也称为“朴素k-means”，因为存在更快的替代方法。<ref>{{Cite journal|last=Pelleg|first=Dan|last2=Moore|first2=Andrew|date=1999|title=Accelerating exact k -means algorithms with geometric reasoning|url=http://portal.acm.org/citation.cfm?doid=312129.312248|journal=Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '99|language=en|location=San Diego, California, United States|publisher=ACM Press|pages=277–281|doi=10.1145/312129.312248|isbn=9781581131437}}</ref>

−

给定一个初始的k均值m1（1），...，mk（1）（见下文），该算法通过以下两个步骤交替进行：<ref>{{Cite book |url=http://www.inference.phy.cam.ac.uk/mackay/itila/book.html |title=Information Theory, Inference and Learning Algorithms |last=MacKay |first=David |publisher=Cambridge University Press |year=2003 |isbn=978-0-521-64298-9 |pages=284–292 |chapter=Chapter 20. An Example Inference Task: Clustering |mr=2012999 |ref=mackay2003 ~~|authorlink=David MacKay (scientist)~~ |chapterurl=http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/284.292.pdf }}</ref>

+

给定一个初始的k均值m1（1），...，mk（1）（见下文），该算法通过以下两个步骤交替进行：<ref>{{Cite book |url=http://www.inference.phy.cam.ac.uk/mackay/itila/book.html |title=Information Theory, Inference and Learning Algorithms |last=MacKay |first=David |publisher=Cambridge University Press |year=2003 |isbn=978-0-521-64298-9 |pages=284–292 |chapter=Chapter 20. An Example Inference Task: Clustering |mr=2012999 |ref=mackay2003 |chapterurl=http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/284.292.pdf }}</ref>

第99行：第99行：

::<math>m^{(t+1)}_i = \frac{1}{\left|S^{(t)}_i\right|} \sum_{x_j \in S^{(t)}_i} x_j </math>。

−

当分配不再更改时，该算法已收敛。该算法不能保证找到最佳值。<ref name="hartigan19792">{{Cite journal |last1=Hartigan |first1=J. A. |last2=Wong |first2=M. A. |year=1979 |title=Algorithm AS 136: A ''k''-Means Clustering Algorithm |journal=[[Journal of the Royal Statistical Society, Series C]] |volume=28 |issue=1 |pages=100–108 |jstor=2346830 }}</ref>

+

当分配不再更改时，该算法已收敛。该算法不能保证找到最佳值。<ref name="hartigan19792">{{Cite journal |last1=Hartigan |first1=J. A. |last2=Wong |first2=M. A. |year=1979 |title=Algorithm AS 136: A ''k''-Means Clustering Algorithm |journal=Journal of the Royal Statistical Society, Series C |volume=28 |issue=1 |pages=100–108 |jstor=2346830 }}</ref>

该算法通常表示为按距离将对象分配给最近的群集。使用除（平方）[[欧氏距离]]以外的其他距离函数可能会阻止算法收敛。已经提出的改进的k-means算法，如[[spherical k-means]]和[[k-medoids]]，使用了其他距离度量。

第119行：第119行：

“分配”步骤被称为“期望步骤”，而“更新步骤”是最大化步骤，这使得该算法成为广义期望最大化算法的变体。

−

===变体===

薄荷

7,129

个编辑

更改

K-means聚类 k-means clustering (查看源代码)

2020年4月22日 (三) 00:08的版本

导航菜单

搜索