I am trying to use kmeans clustering in scipy, exactly the one present here:
What I am trying to do is to convert a list of list such as the following:
data without_x[
[0, 0, 0, 0, 0, 0, 0, 20.0, 1.0, 48.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1224.0, 125.5, 3156.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 22.5, 56.0, 41.5, 85.5, 0, 0, 0, 0, 0, 0, 0, 0, 1495.0, 3496.5, 2715.0, 5566.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]
into a ndarry in order to use it with the Kmeans method. When I try to convert the list of list into the ndarray I get an empty array, thus voiding the whole analysis. The length of the ndarray is variable and it depends on the number of samples gathered. But I can get that easily with the len(data_without_x)
Here is a snippet of the code that returns the empty list.
import numpy as np
import "other functions"
data, data_without_x = data_preparation.generate_sampled_pdf()
nodes_stats, k, list_of_list= result_som.get_number_k()
data_array = np.array(data_without_x)
whitened = whiten(data_array)
centroids, distortion = kmeans(whitened, int(k), iter=100000)
and this is what I get as output just saving in a simple log file:
___________________________
this is the data array[[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]
___________________________
This is the whitened array[[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
...,
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]
[ nan nan nan ..., nan nan nan]]
___________________________
Does anybody have a clue about what happens when I try to convert the list of list into a numpy.array?
Thanks for your help
That is exactly how to convert a list of lists to an ndarray in python. Are you sure your data_without_x
is filled correctly? On my machine:
data = [[1,2,3,4],[5,6,7,8]]
data_arr = np.array(data)
data_arr
array([[1,2,3,4],
[5,6,7,8]])
Which is the behavior I think you're expecting
Looking at your input you have a lot of zeros...keep in mind that the print out doesn't show all of it. You may just be seeing all the "zeros" from your input. Examine a specific non zero element to be sure
vq.whiten
and vq.kmeans
expect an array of shape (M, N)
, where each row is an observation. So transpose your data_array
:
import numpy as np
import scipy.cluster.vq as vq
np.random.seed(2013)
data_without_x = [
[0, 0, 0, 0, 0, 0, 0, 20.0, 1.0, 48.0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1224.0, 125.5, 3156.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 22.5, 56.0, 41.5, 85.5, 0, 0, 0, 0, 0, 0, 0, 0, 1495.0,
3496.5, 2715.0, 5566.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]
data_array = np.array(data_without_x).T
whitened = vq.whiten(data_array)
centroids, distortion = vq.kmeans(whitened, 5)
print(centroids)
yields
[[ 1.22649791e+00 2.69573144e+00]
[ 3.91943108e-03 5.57406434e-03]
[ 5.73668382e+00 4.83161524e+00]
[ 0.00000000e+00 1.29763133e+00]]
use asarray function of numpy. Its simple: Ref: https://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.