简体   繁体   English

在python中使用scipy的kmeans2函数

[英]Using scipy's kmeans2 function in python

I found this example for using kmeans2 algorithm in python. 我发现在使用Python kmeans2算法的例子。 I can't get the following part 我不能得到以下部分

# make some z vlues
z = numpy.sin(xy[:,1]-0.2*xy[:,1])

# whiten them
z = whiten(z)

# let scipy do its magic (k==3 groups)
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)

The points are zip(xy[:,0],xy[:,1]) , so what is the third value z doing here? 这些点是zip(xy[:,0],xy[:,1]) ,那么z在这里做的第三个值是多少?

Also what is whitening? 什么是美白?

Any explanation is appreciated. 任何解释都表示赞赏。 Thanks. 谢谢。

First: 第一:

# make some z vlues
z = numpy.sin(xy[:,1]-0.2*xy[:,1])

The weirdest thing about this is that it's equivalent to: 关于这一点最奇怪的是它相当于:

z = numpy.sin(0.8*xy[:, 1])

So I don't know why it's written that way. 所以我不知道为什么这样写。 maybe there's a typo? 也许有拼写错误?

Next, 下一个,

# whiten them
z = whiten(z)

whitening is simply normalizing the variance of the population. 美白只是简化了人口的方差。 See here for a demo: 请看这里的演示:

>>> z = np.sin(.8*xy[:, 1])      # the original z
>>> zw = vq.whiten(z)            # save it under a different name
>>> zn = z / z.std()             # make another 'normalized' array
>>> map(np.std, [z, zw, zn])     # standard deviations of the three arrays
[0.42645, 1.0, 1.0]
>>> np.allclose(zw, zn)          # whitened is the same as normalized
True

It's not obvious to me why it is whitened. 对我来说, 为什么它变白是不明显的。 Anyway, moving along: 无论如何,继续前进:

# let scipy do its magic (k==3 groups)
res, idx = kmeans2(numpy.array(zip(xy[:,0],xy[:,1],z)),3)

Let's break that into two parts: 让我们把它分成两部分:

data = np.array(zip(xy[:, 0], xy[:, 1], z))

which is a weird (and slow) way of writing 这是一种奇怪的(而且很慢的)写作方式

data = np.column_stack([xy, z])

In any case, you started with two arrays and merge them into one: 无论如何,你从两个数组开始并将它们合并为一个:

>>> xy.shape
(30, 2)
>>> z.shape
(30,)
>>> data.shape
(30, 3)

Then it's data that is passed to the kmeans algorithm: 然后是传递给kmeans算法的data

res, idx = vq.kmeans2(data, 3)

So now you can see that it's 30 points in 3d space that are passed to the algorithm, and the confusing part is how the set of points were created. 所以现在你可以看到它在3d空间中被传递给算法的30个点,而令人困惑的部分是如何创建点集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM