简体   繁体   English

给出相似矩阵的树状图通过scipy

[英]Dendrogram through scipy given a similarity matrix

I have computed a jaccard similarity matrix with Python. 我用Python计算了一个jaccard相似度矩阵。 I want to cluster highest similarities to lowest, however, no matter what linkage function I use it produces the same dendrogram! 我想将最高相似性聚类到最低,但是,无论我使用什么连接函数,它都会生成相同的树形图! I have a feeling that the function assumes that my matrix is of original data, but I have already computed the first similarity matrix. 我觉得该函数假设我的矩阵是原始数据,但我已经计算了第一个相似性矩阵。 Is there any way to pass this similarity matrix through to the dendrogram so it plots correctly? 有没有办法将这个相似性矩阵传递给树形图,以便正确绘制? Or am I going to have to output the matrix and simply do it with R. Passing through the original raw data is not possible, as I am computing similarities of words. 或者我将不得不输出矩阵并简单地用R来完成。传递原始原始数据是不可能的,因为我正在计算单词的相似性。 Thanks for the help! 谢谢您的帮助!

Here is some code: 这是一些代码:

SimMatrix = [[ 0.,0.09259259,  0.125     ,  0.        ,  0.08571429],
   [ 0.09259259,  0.        ,  0.05555556,  0.        ,  0.05128205],
   [ 0.125     ,  0.05555556,  0.        ,  0.03571429,  0.05882353],
   [ 0.        ,  0.        ,  0.03571429,  0.        ,  0.        ],
   [ 0.08571429,  0.05128205,  0.05882353,  0.        ,  0.        ]]

linkage = hcluster.complete(SimMatrix) #doesnt matter what linkage...
dendro  = hcluster.dendrogram(linkage) #same plot for all types?
show()

If you run this code, you will see a dendrogram that is completely backwards. 如果运行此代码,您将看到完全向后的树形图。 No matter what linkage type I use, it produces the same dendrogram. 无论我使用什么样的链接类型,它都会产生相同的树状图。 This intuitively can not be correct! 这直觉上是不正确的!

Here's the solution. 这是解决方案。 Turns out the SimMatrix needs to be first converted into a condensed matrix (the diagonal, upper right or bottom left, of this matrix). 事实证明,需要首先将SimMatrix转换为浓缩矩阵(此矩阵的对角线,右上角或左下角)。 You can see this in the code below: 您可以在下面的代码中看到:

import scipy.spatial.distance as ssd
distVec = ssd.squareform(SimMatrix)
linkage = hcluster.linkage(1 - distVec)
dendro  = hcluster.dendrogram(linkage)
show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM