简体   繁体   中英

Euclidean distance between elements in two different matrices?

I am trying to determine the Euclidean distance for my documents from their centroids. The dimensions of the two arrays in question ( points and centers ) satisfy the XA and XB dimensional requirements for scipy.spatial.distance.cdist , but I don't know why I'm getting the below ValueError .

My code:

import pandas as pd, numpy as np
from scipy.spatial.distance import cdist
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

corpus = pd.Series(["bye bye brutal good bye apple banana orange", "bye bye hello apple banana", "corn wheat apple banana goodbye cookie brutal", "fruit cake banana apple bye sweet sweet"])
X = vectorizer.fit_transform(corpus)
model = Kmeans(n_clusters = 2)
model.fit(X)
centers = model.cluster_centroids_

cdist(X, centers)

This is the error I get:

ValueError: setting an array element with a sequence.

From scipy.spatial.distance.cdist 's documentation:

Parameters: XA: ndarray
    An Ma by n array of Ma original observations in an n-dimensional space
            XB: ndarray
    An Mb by n array of Mb original observations in an n-dimensional space
...

My X and centers numpy arrays certainly satisfy these dimensional conditions for cdist , right? What am I missing?

Just a small change that you need to do:

cdist(X.toarray(),centers)

Since X is an object of type scipy.sparse.csr.csr_matrix it will not be directly taken as a valid input by the scipy function. The method toarray() converts it to a valid numpy array

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM