简体   繁体   中英

How to calculate distance between cluster centres of kmeans and choose the minimum in python?

I have ran a kmeans algorithm using sklearn.cluster.KMeans , where I save the results in the object kmeans_results

I can do cl_centers = kmeans_results.cluster_centers_ in order to obtain the cluster centers.

cl_centers look like this:

array([[0.69332691, 0.9118433 , 0.14215727, 0.00903798],
       [0.41407049, 0.95964501, 0.19565154, 0.03157038],
       [0.88239715, 0.65602688, 0.20304053, 0.01066663],
       [0.65413307, 0.92372214, 0.36504241, 0.03482278]])

I would like to calculate the in between distance of these 4 points, and choose the smallest one, together with their "labels" (where label is just the array index).

The ideal output would be something like:

"The smallest distance is x, and it occurs between cluster 0 and cluster 3"

By "distance" I mean Euclidean distance

Is there a pythonic way of doing this ?

you can try scipy.spatial.distance.pdist(your_array) which gives you distance matrix between points. Then get your minimal distance

The solution to your problem consists of 2 parts.

  1. calculate the pair-wise distance matrix of the cl_centers array.
  2. Find the indices of the minimum position.

So as @zelenov aleksey suggested for the first part, the scipy.spatial.distance.pdist will calculate the pair-wise distances. and then you can create a list of combination of pairwise indices to select from using itertools.combinations

The following will give you the ideal output you stated in your question:

import numpy as np
from scipy.spatial.distance import pdist
import itertools as it

centers_arr = np.array([[0.69332691, 0.9118433 , 0.14215727, 0.00903798],
       [0.41407049, 0.95964501, 0.19565154, 0.03157038],
       [0.88239715, 0.65602688, 0.20304053, 0.01066663],
       [0.65413307, 0.92372214, 0.36504241, 0.03482278]])

pairs = list(it.combinations(range(4),2))

d = pdist(centers_arr)
print("The smallest distance is {:}, and it occurs between cluster {:} and cluster {:}".format(d.min(), *pairs[d.argmin(axis=0)]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM