简体   繁体   English

树形图或距离矩阵的其他图

[英]Dendrogram or Other Plot from Distance Matrix

I have three matrices to compare. 我有三个矩阵可供比较。 Each of them is 5x6. 每个都是5x6。 I originally wanted to use hierarchical clustering to cluster the matrices, such that the most similar matrices are grouped, given a threshold of similarity. 我最初想要使用层次聚类来聚类矩阵,以便在给定相似性阈值的情况下对最相似的矩阵进行分组。

I could not find any such functions in python, so I implemented the distance measure by hand, (p-norm where p=2) . 我在python中找不到任何这样的函数,所以我手工实现了距离测量, (p-norm,其中p = 2) Now I have a 3x3 distance matrix (which I believe is also a similarity matrix in this case). 现在我有一个3x3距离矩阵(我相信在这种情况下也是一个相似矩阵)。

I am now trying to produce a dendrogram. 我现在正在尝试生成树状图。 This is my code, and this is what is wrong. 这是我的代码,这就是错误的。 I want to produce a graph (a dendrogram if possible) that shows clusters of the matrices that are most similar. 生成一个图形(如果可能的话,树形图),显示最相似的矩阵的簇。 Of matrices 0,1,2, 0 and 2 are the same and should be clustered together first, and 1 is different. 矩阵0,1,2,0和2是相同的并且应该首先聚集在一起,并且1是不同的。

The distance matrix looks like this: 距离矩阵如下所示:

>   0     1    2 
0   0.0    2.0  3.85e-16
1   2.0    0.0  2.0
2 3.85e-16 2.0  0.0

Code: 码:

from scipy.cluster.hierarchy import dendrogram
import matplotlib.pyplot as plt
import numpy as np
from scipy.cluster.hierarchy import linkage
mat = np.array([[0.0, 2.0, 3.8459253727671276e-16], [2.0, 0.0, 2.0], [3.8459253727671276e-16, 2.0, 0.0]])
dist_mat = mat
linkage_matrix = linkage(dist_mat, "single")
dendrogram(linkage_matrix, color_threshold=1, labels=["0", "1", "2"],show_leaf_counts=True)
plt.title=("test")
plt.show()

This is the output: 这是输出: 在此输入图像描述

What is the meaning of the linkage(dist_mat, 'single')? 联系的意义是什么(dist_mat,'single')? I would have assumed the output graph to look something like this, where the distance is 2.0 between 0 and 1 (for example). 我会假设输出图看起来像这样,其中距离是2.0在0和1之间(例如)。 在此输入图像描述

Are there better ways to represent these data? 有更好的方法来表示这些数据吗? Is there a function that could take in several matrices instead of points, to compare and form a distance matrix, and then cluster? 是否有一个函数可以接受几个矩阵而不是点,比较并形成距离矩阵,然后聚类? I am open to other suggestions on how to visualize the differences between these matrices. 我对如何可视化这些矩阵之间的差异的其他建议持开放态度。

The first argument of linkage should not be the square distance matrix. linkage的第一个参数不应该是方形距离矩阵。 It must be the condensed distance matrix . 它必须是压缩距离矩阵 In your case, that would be np.array([2.0, 3.8459253727671276e-16, 2]) . 在你的情况下,那将是np.array([2.0, 3.8459253727671276e-16, 2]) You can convert from the square distance matrix to the condensed form using scipy.spatial.distance.squareform 您可以使用scipy.spatial.distance.squareform从方形距离矩阵转换为压缩形式

If you pass a two dimensional array to linkage with shape (m, n) , it treats it as an array of m points in n -dimensional space and it computes the distances of those points itself. 如果将二维数组传递linkage形状(m, n) ,则会将其视为n维空间中m个点的数组,并计算这些点本身的距离。 That's why you didn't get an error when you passed in the square distance matrix--but you got an incorrect plot. 这就是为什么当你通过方形距离矩阵时没有得到错误 - 但是你得到了一个不正确的情节。 (This is an undocumented "feature" of linkage .) (这是一个未记录的“功能” linkage 。)

Also note that because the distance 3.8e-16 is so small, the horizontal line associated with the link between points 0 and 2 might not be visible in the plot--it is on the x axis. 另请注意,由于距离3.8e-16太小,与点0和2之间的链接相关联的水平线可能在图中不可见 - 它位于x轴上。

Here's a modified version of your script. 这是您脚本的修改版本。 For this example, I've changed that tiny distance to 0.1, so the associated cluster is not obscured by the x axis. 对于此示例,我将该小距离更改为0.1,因此关联的群集不会被x轴遮挡。

import numpy as np

from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import squareform

import matplotlib.pyplot as plt


mat = np.array([[0.0, 2.0, 0.1], [2.0, 0.0, 2.0], [0.1, 2.0, 0.0]])
dists = squareform(mat)
linkage_matrix = linkage(dists, "single")
dendrogram(linkage_matrix, labels=["0", "1", "2"])
plt.title("test")
plt.show()

Here is the plot created by the script: 这是脚本创建的图:

树形图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM