更改

层次聚类 (查看源代码)

2020年10月7日 (三) 18:00的版本

添加5字节、 2020年10月7日 (三) 18:00

无编辑摘要

第477行：第477行：

For example, suppose this data is to be clustered, and the Euclidean distance is the distance metric.

−

~~例如，假设要对这些数据进行聚类，距离欧几里得度量就是距离度量。~~

+

例如，假设要对这些数据进行聚类，欧几里得距离就是距离度量。

第493行：第493行：

Traditional representation

−

~~传统表现~~

+

传统展现法

第501行：第501行：

Cutting the tree at a given height will give a partitioning clustering at a selected precision. In this example, cutting after the second row (from the top) of the dendrogram will yield clusters {a} {b c} {d e} {f}. Cutting after the third row will yield clusters {a} {b c} {d e f}, which is a coarser clustering, with a smaller number but larger clusters.

−

~~在给定的高度切割树将以选定的精度提供分区聚类。在这个示例中，在树状图的第二行~~(从顶部开始)之后切割将产生集群{ a }{ b }{ d }{ f }。在第三行之后进行切割将产生集群{ a }{ b }{ d e f } ~~，这是一个粗糙的集群，具有较小的数量但较大的集群。~~

+

在给定的高度切割树状图中，将以选定的精度提供分区聚类。在这个示例中，在树状图的第二行(从顶部开始)之后切割将产生集群{ a }{ b }{ d }{ f }。在第三行之后进行切割将产生集群{ a }{ b }{ d e f } ，这是一个较为粗略的但元素更繁多的集群，然而它的数量也较小。

第517行：第517行：

Optionally, one can also construct a distance matrix at this stage, where the number in the i-th row j-th column is the distance between the i-th and j-th elements. Then, as clustering progresses, rows and columns are merged as the clusters are merged and the distances updated. This is a common way to implement this type of clustering, and has the benefit of caching distances between clusters. A simple agglomerative clustering algorithm is described in the single-linkage clustering page; it can easily be adapted to different types of linkage (see below).

−

还可以选择在这个阶段构造一个距离矩阵，其中第 i 行 j-th 列中的数字是 i-th 和 j-th 元素之间的距离。然后，随着集群的进展，在合并集群和更新距离时合并行和列。这是实现此类集群的常用方法，并且具有缓存集群之间的距离的优点。在单链接聚类页面中描述了一个简单的凝聚聚类算法; ~~它可以很容易地适应不同类型的链接~~(见下文)。

+

还可以选择在这个阶段构造一个距离矩阵，其中第i行第j列中的数字是i和j，即为两个元素之间的距离。然后，随着集群的进展，在合并集群和更新距离时合并行和列。这是实现此类集群的常用方法，并且具有缓存集群之间的距离的优点。在单链接聚类页面中描述了一个简单的凝聚聚类算法; 它适用于很多链接(见下文)。

第534行：第534行：

* The maximum distance between elements of each cluster (also called [[complete-linkage clustering]]):

−

*~~每个簇元素之间的最大距离（又名~~[[完全链路集]])

+

*每个集群元素之间的最大距离（又名[[完全链路集]])

::<math> \max \{\, d(x,y) : x \in \mathcal{A},\, y \in \mathcal{B}\,\}. </math>

第542行：第542行：

* The minimum distance between elements of each cluster (also called [[single-linkage clustering]]):

−

*~~每个簇的元素之间的最小距离（也称为~~[[单个链路集]]):

+

*每个集群的元素之间的最小距离（也称为[[单个链路集]]):

::<math> \min \{\, d(x,y) : x \in \mathcal{A},\, y \in \mathcal{B} \,\}. </math>

第558行：第558行：

* The sum of all intra-cluster variance.

−

*~~所有簇内方差之和。~~

+

*所有集群内方差之和。

* The increase in variance for the cluster being merged ([[Ward's method]]<ref name="wards method"/>)

*合并的聚类的方差增加([[离差平方和法]]<ref name="离差平方和法"/>)。

第569行：第569行：

In case of tied minimum distances, a pair is randomly chosen, thus being able to generate several structurally different dendrograms. Alternatively, all tied pairs may be joined at the same time, generating a unique dendrogram.

−

在系结最小距离的情况下，一对是随机选择的，因此能够产生几个结构不同的树状图。或者，所有的绑定对可以在同一时间结合，产生一个唯一的树状图。

+

在系结最小距离的情况下，一对元素是随机选择的，因此能够产生几个结构不同的树状图。或者，所有的绑定对可以在同一时间结合，产生一个唯一的树状图。

第577行：第577行：

One can always decide to stop clustering when there is a sufficiently small number of clusters (number criterion). Some linkages may also guarantee that agglomeration occurs at a greater distance between clusters than the previous agglomeration, and then one can stop clustering when the clusters are too far apart to be merged (distance criterion). However, this is not the case of, e.g., the centroid linkage where the so-called reversals (inversions, departures from ultrametricity) may occur.

−

~~人们总是可以决定停止群集时，有一个足够少的群集~~(数目标准)。有些联系还可能保证集群之间的距离大于以前的集群，然后当集群之间的距离太远而无法合并时就可以停止集群(距离标准)。然而，这不是例如，质心链接的情况下，所谓的逆转(反转，偏离超节拍)~~可能发生的情况。~~

+

当有一个足够少的群集(数目标准)时，人们总是可以决定停止聚合。有些联系还可能保证集群之间的距离大于以前的集群，然后当集群之间的距离太远而无法合并时，就可以停止集群。然而，也有例外，如在质心链接的情况下，所谓的逆转(反转，偏离超节拍)就可能发生。

== Divisive clustering分裂聚类 ==

CecileLi

526

个编辑

更改

层次聚类 (查看源代码)

2020年10月7日 (三) 18:00的版本

导航菜单

搜索