简体   繁体   English

我可以对kmeans功能使用自定义距离度量吗?

[英]Can I use a custom distance measure for kmeans function?

I am using the function, kmeans, to perfrom K-means clustering. 我正在使用kmeans函数来进行K均值聚类。

I have a special data which need a custom distance measure function and custom mean function. 我有一个特殊的数据,需要自定义距离测量函数和自定义均值函数。

Can I put (1) a custom distance measure function and (2) custom mean function to the kmeans function? 是否可以将(1)自定义距离测量函数和(2)自定义均值函数放入kmeans函数?

It seems it uses Euclidean measure only. 似乎它仅使用欧几里得度量。

The standard kmeans does not allow this, for good reasons. 标准的kmeans不允许这样做,这有充分的理由。 It uses some clever algorithms (Hartigan and Wong; which is why it is much faster than the standard Lloyd textbook algorithm you find in about 100 other R packages). 它使用了一些聪明的算法(Hartigan和Wong;这就是为什么它比您在其他100个R包中找到的标准Lloyd教科书算法快得多的原因)。 But these only work for the classic k-means scenario with squared deviations (which means assigning each cluster to the Euclidean nearest center, but it actually optimizes least-squares, not Euclidean distances). 但是这些仅适用于具有平方偏差的经典k均值方案(这意味着将每个聚类分配给欧几里得最近的中心,但实际上会优化最小二乘而不是欧几里得距离)。

I doubt you can simply plug in other distances and centroid functions into the Hartigan and Wong method (apart from it being written in Fortran, so you cannot just plug in a R function there anyway). 我怀疑您是否可以简单地将其他距离和质心函数插入Hartigan and Wong方法中(除了用Fortran编写,因此您无论如何都不能在那里插入R函数)。

Beware that there are very few known combinations where other distances and means are known to always converge well. 请注意, 很少有已知的组合,在这些组合中,其他距离和平均值总是可以很好地收敛。 Bregman divergences should be fine, and cosine is equivalent to squared Euclidean on a sphere, so it will also work. Bregman发散度应该很好,并且余弦等效于球体上的平方欧几里得,因此它也将起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM