第130行: |
第130行: |
| * 一些方法尝试使用三角形不等式来加快每个k-means步骤。<ref name="phillips2" /><ref name="elkan2" /><ref name="hamerly22" /><ref>{{Cite journal |last=Drake |first=Jonathan |date=2012 |title=Accelerated ''k''-means with adaptive distance bounds |url=http://opt.kyb.tuebingen.mpg.de/papers/opt2012_paper_13.pdf |journal=The 5th NIPS Workshop on Optimization for Machine Learning, OPT2012 }}</ref><ref name="hamerly32" /> | | * 一些方法尝试使用三角形不等式来加快每个k-means步骤。<ref name="phillips2" /><ref name="elkan2" /><ref name="hamerly22" /><ref>{{Cite journal |last=Drake |first=Jonathan |date=2012 |title=Accelerated ''k''-means with adaptive distance bounds |url=http://opt.kyb.tuebingen.mpg.de/papers/opt2012_paper_13.pdf |journal=The 5th NIPS Workshop on Optimization for Machine Learning, OPT2012 }}</ref><ref name="hamerly32" /> |
| * 通过在集群之间交换点来逃避局部最优。<ref name="hartigan19792" /> | | * 通过在集群之间交换点来逃避局部最优。<ref name="hartigan19792" /> |
− | * [[球形k均值聚类 Spherical k-means clustering]]算法适用于文本数据。.<ref>{{Cite journal |last1=Dhillon |first1=I. S. |last2=Modha |first2=D. M. |year=2001 |title=Concept decompositions for large sparse text data using clustering |journal=Machine Learning |volume=42 |issue=1 |pages=143–175 |doi=10.1023/a:1007612920971 |doi-access=free }}</ref> | + | * [[球形k均值聚类 Spherical k-means clustering]]算法适用于文本数据。<ref>{{Cite journal |last1=Dhillon |first1=I. S. |last2=Modha |first2=D. M. |year=2001 |title=Concept decompositions for large sparse text data using clustering |journal=Machine Learning |volume=42 |issue=1 |pages=143–175 |doi=10.1023/a:1007612920971 |doi-access=free }}</ref> |
| * 分层变体,例如[[二分k均值 Bisecting k-means]],<ref>{{cite journal | last1 = Steinbach | first1 = M. | last2 = Karypis | first2 = G. | last3 = Kumar | first3 = V. | year = 2000 | title = "A comparison of document clustering techniques". In | url = | journal = KDD Workshop on Text Mining | volume = 400 | issue = 1| pages = 525–526 }}</ref>[[X均值聚类 X-means clustering]]<ref>Pelleg, D.; & Moore, A. W. (2000, June). "[http://cs.uef.fi/~zhao/Courses/Clustering2012/Xmeans.pdf X-means: Extending ''k''-means with Efficient Estimation of the Number of Clusters]". In ''ICML'', Vol. 1</ref>和[[G均值聚类 G-means clustering]]<ref>{{cite journal | last1 = Hamerly | first1 = Greg | last2 = Elkan | first2 = Charles | year = 2004 | title = | url = | journal = Advances in Neural Information Processing Systems | volume = 16 | issue = | page = 281 }}</ref> 反复拆分聚类以构建层次结构,还可以尝试自动确定数据集中聚类的最佳数量。 | | * 分层变体,例如[[二分k均值 Bisecting k-means]],<ref>{{cite journal | last1 = Steinbach | first1 = M. | last2 = Karypis | first2 = G. | last3 = Kumar | first3 = V. | year = 2000 | title = "A comparison of document clustering techniques". In | url = | journal = KDD Workshop on Text Mining | volume = 400 | issue = 1| pages = 525–526 }}</ref>[[X均值聚类 X-means clustering]]<ref>Pelleg, D.; & Moore, A. W. (2000, June). "[http://cs.uef.fi/~zhao/Courses/Clustering2012/Xmeans.pdf X-means: Extending ''k''-means with Efficient Estimation of the Number of Clusters]". In ''ICML'', Vol. 1</ref>和[[G均值聚类 G-means clustering]]<ref>{{cite journal | last1 = Hamerly | first1 = Greg | last2 = Elkan | first2 = Charles | year = 2004 | title = | url = | journal = Advances in Neural Information Processing Systems | volume = 16 | issue = | page = 281 }}</ref> 反复拆分聚类以构建层次结构,还可以尝试自动确定数据集中聚类的最佳数量。 |
| * 内部集群评估方法(例如集群轮廓)可以帮助确定集群的数量。 | | * 内部集群评估方法(例如集群轮廓)可以帮助确定集群的数量。 |
| * [[Minkowski加权k均值 Minkowski weighted k-means]]:自动计算聚类特定的特征权重,支持直观的想法,即一个特征在不同的特征上可能具有不同的相关度。<ref>{{Cite journal |last1=Amorim |first1=R. C. |last2=Mirkin |first2=B. |year=2012 |title=Minkowski Metric, Feature Weighting and Anomalous Cluster Initialisation in ''k''-Means Clustering |journal=Pattern Recognition |volume=45 |issue=3 |pages=1061–1075 |doi=10.1016/j.patcog.2011.08.012 }}</ref>这些权重还可以用于重新缩放给定的数据集,增加了在预期的群集数下优化群集有效性指标的可能性。<ref>{{Cite journal |last1=Amorim |first1=R. C. |last2=Hennig |first2=C. |year=2015 |title=Recovering the number of clusters in data sets with noise features using feature rescaling factors |journal=Information Sciences |volume=324 |pages=126–145 |arxiv=1602.06989 |doi=10.1016/j.ins.2015.06.039 }}</ref> | | * [[Minkowski加权k均值 Minkowski weighted k-means]]:自动计算聚类特定的特征权重,支持直观的想法,即一个特征在不同的特征上可能具有不同的相关度。<ref>{{Cite journal |last1=Amorim |first1=R. C. |last2=Mirkin |first2=B. |year=2012 |title=Minkowski Metric, Feature Weighting and Anomalous Cluster Initialisation in ''k''-Means Clustering |journal=Pattern Recognition |volume=45 |issue=3 |pages=1061–1075 |doi=10.1016/j.patcog.2011.08.012 }}</ref>这些权重还可以用于重新缩放给定的数据集,增加了在预期的群集数下优化群集有效性指标的可能性。<ref>{{Cite journal |last1=Amorim |first1=R. C. |last2=Hennig |first2=C. |year=2015 |title=Recovering the number of clusters in data sets with noise features using feature rescaling factors |journal=Information Sciences |volume=324 |pages=126–145 |arxiv=1602.06989 |doi=10.1016/j.ins.2015.06.039 }}</ref> |
| * Mini-batch k-means:针对不适合内存的数据集使用“mini batch”样本。<ref>{{Cite conference |last=Sculley |first=David |date=2010 |title=Web-scale ''k''-means clustering |url=http://dl.acm.org/citation.cfm?id=1772862 |publisher=ACM |pages=1177–1178 |accessdate=2016-12-21 |booktitle=Proceedings of the 19th international conference on World Wide Web }}</ref> | | * Mini-batch k-means:针对不适合内存的数据集使用“mini batch”样本。<ref>{{Cite conference |last=Sculley |first=David |date=2010 |title=Web-scale ''k''-means clustering |url=http://dl.acm.org/citation.cfm?id=1772862 |publisher=ACM |pages=1177–1178 |accessdate=2016-12-21 |booktitle=Proceedings of the 19th international conference on World Wide Web }}</ref> |
− |
| |
| | | |
| ===Hartigan-Wong方法=== | | ===Hartigan-Wong方法=== |