简体   繁体   English

如何修复 Seaborn clustermap 矩阵?

[英]How to fix Seaborn clustermap matrix?

I have a three column csv file that I am trying to convert to a clustered heatmap.我有一个三列 csv 文件,我正在尝试将其转换为集群热图。 My code looks like this:我的代码如下所示:

sum_mets = pd.read_csv('sum159_localization_met_magma.csv')
df5 = sum_mets[['Phenotype','Gene','P']]

clustermap5 = sns.clustermap(df5, cmap= 'inferno',  figsize=(40, 40), pivot_kws={'index': 'Phenotype', 
                                  'columns' : 'Gene',
                                  'values' : 'P'})

I then receive this ValueError:然后我收到这个 ValueError:

ValueError: The condensed distance matrix must contain only finite values.

For context all of my values are non-zero.对于上下文,我的所有值都不为零。 I am not sure what values is it unable to process.我不确定它无法处理哪些值。 Thank you in advance to anyone who can help.提前感谢任何可以提供帮助的人。

While you have no NaN, you need to check whether your observations are complete, because there is a pivot underneath, for example:虽然你没有 NaN,但你需要检查你的观察是否完整,因为下面有一个 pivot,例如:

df = pd.DataFrame({'Phenotype':np.repeat(['very not cool','not cool','very cool','super cool'],4),
                   'Gene':["Gene"+str(i) for i in range(4)]*4,
                   'P':np.random.uniform(0,1,16)})

pd.pivot(df,columns="Gene",values="P",index="Phenotype")

Gene    Gene0   Gene1   Gene2   Gene3
Phenotype               
not cool    0.567653    0.984555    0.634450    0.406642
super cool  0.820595    0.072393    0.774895    0.185072
very cool   0.231772    0.448938    0.951706    0.893692
very not cool   0.227209    0.684660    0.013394    0.711890

The above pivots without NaN, and plots well:上面没有 NaN 的枢轴,并且绘制得很好:

sns.clustermap(df,figsize=(5, 5),pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})

在此处输入图像描述

but let's say if we have 1 less observation:但是假设我们有 1 少观察:

df1 = df[:15]
pd.pivot(df1,columns="Gene",values="P",index="Phenotype")

Gene    Gene0   Gene1   Gene2   Gene3
Phenotype               
not cool    0.106681    0.415873    0.480102    0.721195
super cool  0.961991    0.261710    0.329859    NaN
very cool   0.069925    0.718771    0.200431    0.196573
very not cool   0.631423    0.403604    0.043415    0.373299

And it fails if you try to call clusterheatmap:如果您尝试调用 clusterheatmap,它会失败:

sns.clustermap(df1, pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})
The condensed distance matrix must contain only finite values.

I suggest checking whether the missing values are intended or a mistake.我建议检查缺失值是有意的还是错误的。 So if you indeed have some missing values, you can get around the clustering but pre-computing the linkage and passing it to the function, for example using correlation below:因此,如果您确实有一些缺失值,您可以绕过聚类但预先计算链接并将其传递给 function,例如使用以下相关性:

import scipy.spatial as sp, scipy.cluster.hierarchy as hc

row_dism = 1 - df1.T.corr()
row_linkage = hc.linkage(sp.distance.squareform(row_dism), method='complete')
col_dism = 1 - df1.corr()
col_linkage = hc.linkage(sp.distance.squareform(col_dism), method='complete')

sns.clustermap(df1,figsize=(5, 5),row_linkage=row_linkage, col_linkage=col_linkage)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM