[英]Dendrogram through scipy given a similarity matrix
I have computed a jaccard similarity matrix with Python. 我用Python计算了一个jaccard相似度矩阵。 I want to cluster highest similarities to lowest, however, no matter what linkage function I use it produces the same dendrogram!
我想将最高相似性聚类到最低,但是,无论我使用什么连接函数,它都会生成相同的树形图! I have a feeling that the function assumes that my matrix is of original data, but I have already computed the first similarity matrix.
我觉得该函数假设我的矩阵是原始数据,但我已经计算了第一个相似性矩阵。 Is there any way to pass this similarity matrix through to the dendrogram so it plots correctly?
有没有办法将这个相似性矩阵传递给树形图,以便正确绘制? Or am I going to have to output the matrix and simply do it with R. Passing through the original raw data is not possible, as I am computing similarities of words.
或者我将不得不输出矩阵并简单地用R来完成。传递原始原始数据是不可能的,因为我正在计算单词的相似性。 Thanks for the help!
谢谢您的帮助!
Here is some code: 这是一些代码:
SimMatrix = [[ 0.,0.09259259, 0.125 , 0. , 0.08571429],
[ 0.09259259, 0. , 0.05555556, 0. , 0.05128205],
[ 0.125 , 0.05555556, 0. , 0.03571429, 0.05882353],
[ 0. , 0. , 0.03571429, 0. , 0. ],
[ 0.08571429, 0.05128205, 0.05882353, 0. , 0. ]]
linkage = hcluster.complete(SimMatrix) #doesnt matter what linkage...
dendro = hcluster.dendrogram(linkage) #same plot for all types?
show()
If you run this code, you will see a dendrogram that is completely backwards. 如果运行此代码,您将看到完全向后的树形图。 No matter what linkage type I use, it produces the same dendrogram.
无论我使用什么样的链接类型,它都会产生相同的树状图。 This intuitively can not be correct!
这直觉上是不正确的!
Here's the solution. 这是解决方案。 Turns out the SimMatrix needs to be first converted into a condensed matrix (the diagonal, upper right or bottom left, of this matrix).
事实证明,需要首先将SimMatrix转换为浓缩矩阵(此矩阵的对角线,右上角或左下角)。 You can see this in the code below:
您可以在下面的代码中看到:
import scipy.spatial.distance as ssd
distVec = ssd.squareform(SimMatrix)
linkage = hcluster.linkage(1 - distVec)
dendro = hcluster.dendrogram(linkage)
show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.