简体   繁体   English

如何计算kmeans聚类中心之间的距离并在python中选择最小值?

[英]How to calculate distance between cluster centres of kmeans and choose the minimum in python?

I have ran a kmeans algorithm using sklearn.cluster.KMeans , where I save the results in the object kmeans_results我已经使用sklearn.cluster.KMeans运行了 kmeans 算法,我将结果保存在对象kmeans_results

I can do cl_centers = kmeans_results.cluster_centers_ in order to obtain the cluster centers.我可以做cl_centers = kmeans_results.cluster_centers_以获得聚类中心。

cl_centers look like this: cl_centers看起来像这样:

array([[0.69332691, 0.9118433 , 0.14215727, 0.00903798],
       [0.41407049, 0.95964501, 0.19565154, 0.03157038],
       [0.88239715, 0.65602688, 0.20304053, 0.01066663],
       [0.65413307, 0.92372214, 0.36504241, 0.03482278]])

I would like to calculate the in between distance of these 4 points, and choose the smallest one, together with their "labels" (where label is just the array index).我想计算这 4 个点之间的距离,并选择最小的一个,连同它们的“标签”(其中标签只是数组索引)。

The ideal output would be something like:理想的输出是这样的:

"The smallest distance is x, and it occurs between cluster 0 and cluster 3" “最小距离是 x,它发生在集群 0 和集群 3 之间”

By "distance" I mean Euclidean distance “距离”是指欧几里得距离

Is there a pythonic way of doing this ?有没有一种pythonic的方式来做到这一点?

you can try scipy.spatial.distance.pdist(your_array) which gives you distance matrix between points.您可以尝试scipy.spatial.distance.pdist(your_array)它为您提供点之间的距离矩阵。 Then get your minimal distance然后得到你的最小距离

The solution to your problem consists of 2 parts.您的问题的解决方案由两部分组成。

  1. calculate the pair-wise distance matrix of the cl_centers array.计算cl_centers数组的成对距离矩阵。
  2. Find the indices of the minimum position.找到最小位置的索引。

So as @zelenov aleksey suggested for the first part, the scipy.spatial.distance.pdist will calculate the pair-wise distances.因此,正如@zelenov aleksey 在第一部分所建议的那样, scipy.spatial.distance.pdist将计算成对距离。 and then you can create a list of combination of pairwise indices to select from using itertools.combinations然后您可以创建一个成对索引组合列表,以使用itertools.combinations进行选择

The following will give you the ideal output you stated in your question:以下内容将为您提供您在问题中所述的理想输出:

import numpy as np
from scipy.spatial.distance import pdist
import itertools as it

centers_arr = np.array([[0.69332691, 0.9118433 , 0.14215727, 0.00903798],
       [0.41407049, 0.95964501, 0.19565154, 0.03157038],
       [0.88239715, 0.65602688, 0.20304053, 0.01066663],
       [0.65413307, 0.92372214, 0.36504241, 0.03482278]])

pairs = list(it.combinations(range(4),2))

d = pdist(centers_arr)
print("The smallest distance is {:}, and it occurs between cluster {:} and cluster {:}".format(d.min(), *pairs[d.argmin(axis=0)]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM