使用kmeans聚类查找与特定质心对应的所有点的索引

Question

这是kmeans聚类的简单实现（聚类中的点标记为1到500）：

from pylab import plot,show
from numpy import vstack,array
from numpy.random import rand
from scipy.cluster.vq import kmeans,vq

# data generation
data = vstack((rand(150,2) + array([.5,.5]),rand(150,2)))

# computing K-Means with K = 2 (2 clusters)
centroids,_ = kmeans(data,2)
# assign each sample to a cluster
idx,_ = vq(data,centroids)

#ignore this, just labelling each point in cluster
for label, x, y in zip(labels, data[:, 0], data[:, 1]):
plt.annotate(
    label, 
   xy = (x, y), xytext = (-20, 20),
   textcoords = 'offset points', ha = 'right', va = 'bottom',
   bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
   arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))

# some plotting using numpy's logical indexing
plot(data[idx==0,0],data[idx==0,1],'ob',
     data[idx==1,0],data[idx==1,1],'or')
plot(centroids[:,0],centroids[:,1],'sg',markersize=8)
show()

我试图找到每个群集中所有点的索引。

Answer 1

你已经有...

plot(data[idx==0,0],data[idx==0,1],'ob',
     data[idx==1,0],data[idx==1,1],'or')

猜猜idx做什么，以及data[idx==0]与data[idx==1]包含什么。

Answer 2

在这一行：

idx,_ = vq(data,centroids)

您已经生成了一个向量，其中包含data数组中每个点（行）最近的质心的索引。

似乎您想要所有最接近质心0，质心1等的点的行索引。您可以使用np.nonzero来找到索引，其中idx == i ，其中i是您感兴趣的质心。

例如：

in_0 = np.nonzero(idx == 0)[0]
in_1 = np.nonzero(idx == 1)[0]

在注释中，您还询问为什么idx向量在运行中会有所不同。 这是因为如果将整数作为第二个参数传递给kmeans ，质心位置将被随机初始化（请参见此处）。

使用kmeans聚类查找与特定质心对应的所有点的索引

问题描述

2 个解决方案

解决方案1
1 2016-02-02 11:28:16

解决方案2
1 已采纳 2016-02-02 22:34:33

使用kmeans聚类查找与特定质心对应的所有点的索引

问题描述

2 个解决方案

解决方案1 1 2016-02-02 11:28:16

解决方案2 1 已采纳 2016-02-02 22:34:33

解决方案1
1 2016-02-02 11:28:16

解决方案2
1 已采纳 2016-02-02 22:34:33