[英]How to calculate distance between cluster centres of kmeans and choose the minimum in python?
I have ran a kmeans algorithm using sklearn.cluster.KMeans
, where I save the results in the object kmeans_results
我已经使用
sklearn.cluster.KMeans
运行了 kmeans 算法,我将结果保存在对象kmeans_results
I can do cl_centers = kmeans_results.cluster_centers_
in order to obtain the cluster centers.我可以做
cl_centers = kmeans_results.cluster_centers_
以获得聚类中心。
cl_centers
look like this: cl_centers
看起来像这样:
array([[0.69332691, 0.9118433 , 0.14215727, 0.00903798],
[0.41407049, 0.95964501, 0.19565154, 0.03157038],
[0.88239715, 0.65602688, 0.20304053, 0.01066663],
[0.65413307, 0.92372214, 0.36504241, 0.03482278]])
I would like to calculate the in between distance of these 4 points, and choose the smallest one, together with their "labels" (where label is just the array index).我想计算这 4 个点之间的距离,并选择最小的一个,连同它们的“标签”(其中标签只是数组索引)。
The ideal output would be something like:理想的输出是这样的:
"The smallest distance is x, and it occurs between cluster 0 and cluster 3" “最小距离是 x,它发生在集群 0 和集群 3 之间”
By "distance" I mean Euclidean distance “距离”是指欧几里得距离
Is there a pythonic way of doing this ?有没有一种pythonic的方式来做到这一点?
you can try scipy.spatial.distance.pdist(your_array)
which gives you distance matrix between points.您可以尝试
scipy.spatial.distance.pdist(your_array)
它为您提供点之间的距离矩阵。 Then get your minimal distance然后得到你的最小距离
The solution to your problem consists of 2 parts.您的问题的解决方案由两部分组成。
cl_centers
array.cl_centers
数组的成对距离矩阵。 So as @zelenov aleksey suggested for the first part, the scipy.spatial.distance.pdist
will calculate the pair-wise distances.因此,正如@zelenov aleksey 在第一部分所建议的那样,
scipy.spatial.distance.pdist
将计算成对距离。 and then you can create a list of combination of pairwise indices to select from using itertools.combinations
然后您可以创建一个成对索引组合列表,以使用
itertools.combinations
进行选择
The following will give you the ideal output you stated in your question:以下内容将为您提供您在问题中所述的理想输出:
import numpy as np
from scipy.spatial.distance import pdist
import itertools as it
centers_arr = np.array([[0.69332691, 0.9118433 , 0.14215727, 0.00903798],
[0.41407049, 0.95964501, 0.19565154, 0.03157038],
[0.88239715, 0.65602688, 0.20304053, 0.01066663],
[0.65413307, 0.92372214, 0.36504241, 0.03482278]])
pairs = list(it.combinations(range(4),2))
d = pdist(centers_arr)
print("The smallest distance is {:}, and it occurs between cluster {:} and cluster {:}".format(d.min(), *pairs[d.argmin(axis=0)]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.