更改

层次聚类 (查看源代码)

2020年10月28日 (三) 20:09的版本

删除58字节、 2020年10月28日 (三) 20:09

无编辑摘要

第2行：第2行： −

在[[数据挖掘]]和[[统计学]]中，'''~~<font color="#ff8000">~~ 层次聚类 Hierarchical clustering~~</font>~~'''(~~也被称为“层次数据聚类或”“HCA”~~)是一种通过建立一个集群层次结构来[[聚类分析]]的方法。实现层次聚类的方法通常有两种:<ref name="clusteringMethods">Rokach, Lior, and Oded Maimon. "Clustering methods." Data mining and knowledge discovery handbook. Springer US, 2005. 321-352.</ref>

+

在[[数据挖掘]]和[[统计学]]中，'''层次聚类 Hierarchical clustering （HCA）'''(也被称为“层次数据聚类”)是一种通过建立一个集群层次结构来[[聚类分析]]的方法。实现层次聚类的方法通常有两种:<ref name="clusteringMethods">Rokach, Lior, and Oded Maimon. "Clustering methods." Data mining and knowledge discovery handbook. Springer US, 2005. 321-352.</ref>

* '''凝聚聚类 Agglomerative'''：这是一种“自上而下又自下而上/纵向”的方法：每个被观察数据从自己的簇类中开始，当一个观察组数据向上层移动时，成对的簇类集群被合并。

第13行：第13行： −

'''~~<font color="#ff8000">~~ 层次凝聚聚类 Hierarchical agglomerative ~~clustering</font>~~'''~~（HAC）的标准算法的~~[[时间复杂度]]为<math>\mathcal{O}(n^3)</math> ，并且需要 <math>\mathcal{O}(n^2)</math> ~~占用内存，这使得它对于中等数据集来说效率太低了。然而，对于某些特殊情况，已知的最有效凝聚方法(复杂度~~ <math>\mathcal{O}(n^2)</math>)是: SLINK 用于单连接<ref name="SLINK">{{cite journal | author=R. Sibson | title=SLINK: an optimally efficient algorithm for the single-link cluster method | journal=The Computer Journal | volume=16 | issue=1 | pages=30–34 | year=1973 | publisher=British Computer Society | url=http://www.cs.gsu.edu/~wkim/index_files/papers/sibson.pdf | doi=10.1093/comjnl/16.1.30}}</ref> for [[Single-linkage clustering|single-linkage]]， CLINK用于完全连接<ref name="CLINK">{{cite journal | author=D. Defays | title=An efficient algorithm for a complete-link method | journal=The Computer Journal | volume=20 | issue=4 | pages=364–366 | year=1977 | publisher=British Computer Society | url=http://comjnl.oxfordjournals.org/content/20/4/364.abstract | doi=10.1093/comjnl/20.4.364| doi-access=free }}</ref>[[complete-linkage clustering]]。一般情况下的运行时可以缩减为 <math>\mathcal{O}(n^2 \log n)</math> ，代价是进一步增加内存需求。在多数情况下，这种方法的内存消耗太大，并不实用。

+

'''层次凝聚聚类 Hierarchical agglomerative clustering（HAC）'''的标准算法的[[时间复杂度]]为<math>\mathcal{O}(n^3)</math> ，并且需要 <math>\mathcal{O}(n^2)</math> 占用内存，这使得它对于中等数据集来说效率太低了。然而，对于某些特殊情况，已知的最有效凝聚方法（复杂度 <math>\mathcal{O}(n^2)</math>）是: SLINK 用于单连接<ref name="SLINK">{{cite journal | author=R. Sibson | title=SLINK: an optimally efficient algorithm for the single-link cluster method | journal=The Computer Journal | volume=16 | issue=1 | pages=30–34 | year=1973 | publisher=British Computer Society | url=http://www.cs.gsu.edu/~wkim/index_files/papers/sibson.pdf | doi=10.1093/comjnl/16.1.30}}</ref> for [[Single-linkage clustering|single-linkage]]， CLINK用于完全连接<ref name="CLINK">{{cite journal | author=D. Defays | title=An efficient algorithm for a complete-link method | journal=The Computer Journal | volume=20 | issue=4 | pages=364–366 | year=1977 | publisher=British Computer Society | url=http://comjnl.oxfordjournals.org/content/20/4/364.abstract | doi=10.1093/comjnl/20.4.364| doi-access=free }}</ref>[[complete-linkage clustering]]。一般情况下的运行时可以缩减为 <math>\mathcal{O}(n^2 \log n)</math> ，代价是进一步增加内存需求。在多数情况下，这种方法的内存消耗太大，并不实用。

薄荷

7,129

个编辑