如何在 Python 中使用两个单独的尺度将两个数据集聚类为一个热图？

Question

I am trying to perform clustering analysis on two datasets using the cluster heatmap function in Seaborn.我正在尝试使用Seaborn 中的集群热图功能对两个数据集执行聚类分析。

The problem is that the two datasets come from two different procedures and thus they contain values that are differently distributed (I mean, the 1st dataset has a scale of values ranging from 0 to 1, but the 2nd from 1000 up to 5000).问题是这两个数据集来自两个不同的过程，因此它们包含分布不同的值（我的意思是，第一个数据集的值范围从 0 到 1，但第二个数据集的值从 1000 到 5000）。

My question is:我的问题是：

How can I cluster two datasets that have different range of values?如何对具有不同值范围的两个数据集进行聚类？ Is there a way to cluster the rows of the datasets into a single heatmap and maybe have two scales for each dataset?有没有办法将数据集的行聚集成单个热图，并且每个数据集可能有两个尺度？

Here what I have tried so far, but with only little success:这是我迄今为止尝试过的，但收效甚微：

#First, I have combined the two datasets into one dataframe object:
dataset = pd.concat([dataset_1, dataset_2], axis=0)

#Then, passed the dataframe into Seaborn's `.clustermap()` function:
sns.clustermap(data=dataset, 
    col_cluster=False)

Output: you can notice that the features of dataset_1 are all blocked out because of the scale of the difference in scale between the datasets ( dataset_1 and dataset_2 as shown bellow)输出：你可以注意到dataset_1的特征都被屏蔽了，因为数据集之间的规模差异（ dataset_1和dataset_2如下图）

Any idea how to approach this problem?知道如何解决这个问题吗？

Answer 1

You could use sklearn's preprocessing library, specifically the scaler before creating the clustermap.您可以在创建 clustermap 之前使用 sklearn 的预处理库，特别是缩放器。

The documentation is here: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html#sklearn.preprocessing.scale文档在这里： http : //scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html#sklearn.preprocessing.scale

如何在 Python 中使用两个单独的尺度将两个数据集聚类为一个热图？

问题描述

1 个解决方案

解决方案1
-1 已采纳 2018-02-06 16:49:33

如何在 Python 中使用两个单独的尺度将两个数据集聚类为一个热图？

问题描述

1 个解决方案

解决方案1 -1 已采纳 2018-02-06 16:49:33

解决方案1
-1 已采纳 2018-02-06 16:49:33