更改

跳到导航 跳到搜索
添加28字节 、 2020年10月7日 (三) 14:05
无编辑摘要
第27行: 第27行:  
In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering<ref>{{cite book | author=Frank Nielsen | title=Introduction to HPC with MPI for Data Science |  year=2016 | publisher=Springer |
 
In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering<ref>{{cite book | author=Frank Nielsen | title=Introduction to HPC with MPI for Data Science |  year=2016 | publisher=Springer |
   −
此处翻译编辑视图内有显示阅读视图中无。
+
补充:此处翻译编辑视图内有显示阅读视图中无。
 +
 
 
一般来说,合并和分裂是以使用者希望的方式决定的。'''<font color="#ff8000"> 而层次聚类Hierarchical clustering</font>'''的结果 < ref > { cite book | author = Frank Nielsen | title = Introduction to HPC with MPI for Data Science | year = 2016 | publisher = Springer |  
 
一般来说,合并和分裂是以使用者希望的方式决定的。'''<font color="#ff8000"> 而层次聚类Hierarchical clustering</font>'''的结果 < ref > { cite book | author = Frank Nielsen | title = Introduction to HPC with MPI for Data Science | year = 2016 | publisher = Springer |  
   第42行: 第43行:  
The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of <math>\mathcal{O}(n^3)</math> and requires <math>\mathcal{O}(n^2)</math> memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity <math>\mathcal{O}(n^2)</math>) are known: SLINK<!--boldface per WP:R#PLA--> for single-linkage and CLINK for complete-linkage clustering. With a heap the runtime of the general case can be reduced to <math>\mathcal{O}(n^2 \log n)</math> at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable.
 
The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of <math>\mathcal{O}(n^3)</math> and requires <math>\mathcal{O}(n^2)</math> memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity <math>\mathcal{O}(n^2)</math>) are known: SLINK<!--boldface per WP:R#PLA--> for single-linkage and CLINK for complete-linkage clustering. With a heap the runtime of the general case can be reduced to <math>\mathcal{O}(n^2 \log n)</math> at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable.
   −
'''<font color="#ff8000"> 层次凝聚聚类Hierarchical agglomerative clustering</font>'''(HAC)的标准算法的时间复杂度为 < math > mathical { o }(n ^ 3) </math > ,并且需要 < math > mathcal { o }(n ^ 2) </math > 内存,这使得它对于中等数据集来说太慢了。然而,对于某些特殊情况,已知的最佳有效凝聚方法(复杂度 < math > mathcal { o }(n ^ 2) </math >)是: 单连锁的 SLINK < ! ——粗体 wp: r # pla-- > 和完全连锁的 CLINK。对于堆,一般情况下的运行时可以缩减为 < math > mathcal { o }(n ^ 2 log n) </math > ,代价是进一步增加内存需求。在许多情况下,这种方法的内存开销太大,无法实际使用。
+
'''<font color="#ff8000"> 层次凝聚聚类Hierarchical agglomerative clustering</font>'''(HAC)的标准算法的时间复杂度为 < math > mathical { o }(n ^ 3) </math > ,并且需要 < math > mathcal { o }(n ^ 2) </math > 占用内存,这使得它对于中等数据集来说效率太低了。然而,对于某些特殊情况,已知的最佳有效凝聚方法(复杂度 < math > mathcal { o }(n ^ 2) </math >)是: 单连锁的 SLINK < ! ——粗体 wp: r # pla-- > 和完全连锁的 CLINK。对于数据群而言,一般情况下的运行时可以缩减为 < math > mathcal { o }(n ^ 2 log n) </math > ,代价是进一步增加内存需求。在许多情况下,这种方法的内存开销太大,并不实用。
     
526

个编辑

导航菜单