Euclidean distance between elements in two different matrices?

Question

I am trying to determine the Euclidean distance for my documents from their centroids. The dimensions of the two arrays in question ( points and centers ) satisfy the XA and XB dimensional requirements for scipy.spatial.distance.cdist , but I don't know why I'm getting the below ValueError .

My code:

import pandas as pd, numpy as np
from scipy.spatial.distance import cdist
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans

corpus = pd.Series(["bye bye brutal good bye apple banana orange", "bye bye hello apple banana", "corn wheat apple banana goodbye cookie brutal", "fruit cake banana apple bye sweet sweet"])
X = vectorizer.fit_transform(corpus)
model = Kmeans(n_clusters = 2)
model.fit(X)
centers = model.cluster_centroids_

cdist(X, centers)

This is the error I get:

ValueError: setting an array element with a sequence.

From scipy.spatial.distance.cdist 's documentation:

Parameters: XA: ndarray
    An Ma by n array of Ma original observations in an n-dimensional space
            XB: ndarray
    An Mb by n array of Mb original observations in an n-dimensional space
...

My X and centers numpy arrays certainly satisfy these dimensional conditions for cdist , right? What am I missing?

Answer 1

Just a small change that you need to do:

cdist(X.toarray(),centers)

Since X is an object of type scipy.sparse.csr.csr_matrix it will not be directly taken as a valid input by the scipy function. The method toarray() converts it to a valid numpy array

Euclidean distance between elements in two different matrices?

Question

1 answers

solution1
2 ACCPTED 2016-08-05 13:19:11

Euclidean distance between elements in two different matrices?

Question

1 answers

solution1 2 ACCPTED 2016-08-05 13:19:11

solution1
2 ACCPTED 2016-08-05 13:19:11