更改

层次聚类 (查看源代码)

2020年10月7日 (三) 14:05的版本

添加28字节、 2020年10月7日 (三) 14:05

无编辑摘要

第27行：第27行：

In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering<ref>{{cite book | author=Frank Nielsen | title=Introduction to HPC with MPI for Data Science | year=2016 | publisher=Springer |

−

~~此处翻译编辑视图内有显示阅读视图中无。~~

+

补充：此处翻译编辑视图内有显示阅读视图中无。

+

一般来说，合并和分裂是以使用者希望的方式决定的。''' 而层次聚类Hierarchical clustering'''的结果 < ref > { cite book | author = Frank Nielsen | title = Introduction to HPC with MPI for Data Science | year = 2016 | publisher = Springer |

第42行：第43行：

The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of <math>\mathcal{O}(n^3)</math> and requires <math>\mathcal{O}(n^2)</math> memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity <math>\mathcal{O}(n^2)</math>) are known: SLINK for single-linkage and CLINK for complete-linkage clustering. With a heap the runtime of the general case can be reduced to <math>\mathcal{O}(n^2 \log n)</math> at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable.

−

''' 层次凝聚聚类Hierarchical agglomerative clustering'''(HAC)的标准算法的时间复杂度为 < math > mathical { o }(n ^ 3) </math > ，并且需要 < math > mathcal { o }(n ^ 2) </math > ~~内存，这使得它对于中等数据集来说太慢了。然而，对于某些特殊情况，已知的最佳有效凝聚方法~~(复杂度 < math > mathcal { o }(n ^ 2) </math >)是: 单连锁的 SLINK < ! ——粗体 wp: r # pla-- > 和完全连锁的 ~~CLINK。对于堆，一般情况下的运行时可以缩减为~~ < math > mathcal { o }(n ^ 2 log n) </math > ~~，代价是进一步增加内存需求。在许多情况下，这种方法的内存开销太大，无法实际使用。~~

+

''' 层次凝聚聚类Hierarchical agglomerative clustering'''(HAC)的标准算法的时间复杂度为 < math > mathical { o }(n ^ 3) </math > ，并且需要 < math > mathcal { o }(n ^ 2) </math > 占用内存，这使得它对于中等数据集来说效率太低了。然而，对于某些特殊情况，已知的最佳有效凝聚方法(复杂度 < math > mathcal { o }(n ^ 2) </math >)是: 单连锁的 SLINK < ! ——粗体 wp: r # pla-- > 和完全连锁的 CLINK。对于数据群而言，一般情况下的运行时可以缩减为 < math > mathcal { o }(n ^ 2 log n) </math > ，代价是进一步增加内存需求。在许多情况下，这种方法的内存开销太大，并不实用。

CecileLi

526

个编辑

更改

层次聚类 (查看源代码)

2020年10月7日 (三) 14:05的版本

导航菜单

搜索