简体   繁体   English

Python中的层次聚类问题

[英]problem with hierarchical clustering in Python

I am doing a hierarchical clustering a 2 dimensional matrix by correlation distance metric (ie 1 - Pearson correlation). 我正在通过相关距离度量(即1 - Pearson相关)对二维矩阵进行分层聚类。 My code is the following (the data is in a variable called "data"): 我的代码如下(数据在一个名为“data”的变量中):

from hcluster import *

Y = pdist(data, 'correlation')
cluster_type = 'average'
Z = linkage(Y, cluster_type)
dendrogram(Z)

The error I get is: 我得到的错误是:

ValueError: Linkage 'Z' contains negative distances. 

What causes this error? 是什么导致这个错误? The matrix "data" that I use is simply: 我使用的矩阵“数据”很简单:

[[  156.651968  2345.168618]
 [  158.089968  2032.840106]
 [  207.996413  2786.779081]
 [  151.885804  2286.70533 ]
 [  154.33665   1967.74431 ]
 [  150.060182  1931.991169]
 [  133.800787  1978.539644]
 [  112.743217  1478.903191]
 [  125.388905  1422.3247  ]]

I don't see how pdist could ever produce negative numbers when taking 1 - pearson correlation. 我没有看到pdist在采用1 - pearson相关时如何产生负数。 Any ideas on this? 有什么想法吗?

thank you. 谢谢。

There are some lovely floating point problems going on. 有一些可爱的浮点问题正在发生。 If you look at the results of pdist, you'll find there are very small negative numbers (-2.22044605e-16) in them. 如果你看一下pdist的结果,你会发现它们中的负数非常小(-2.22044605e-16)。 Essentially, they should be zero. 基本上,它们应该为零。 You can use numpy's clip function to deal with it if you would like. 如果您愿意,可以使用numpy的剪辑功能来处理它。

If you were getting error 如果你收到错误

KeyError: -428

and your code was on the lines of 而你的代码就是这样的

import matplotlib.pyplot as plt
import matplotlib as mpl

%matplotlib inline 
from scipy.cluster.hierarchy import ward, dendrogram

linkage_matrix = ward(dist) #define the linkage_matrix using ward clustering pre-computed distances
fig, ax = plt.subplots(figsize=(35, 20),dpi=400) # set size
ax = dendrogram(linkage_matrix, orientation="right",labels=queries);

` It is due to the mismatch in indexes of queries. `这是由于查询索引不匹配。

Might want to update to 可能想要更新到

ax = dendrogram(linkage_matrix, orientation="right",labels=list(queries));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM