简体   繁体   English

从数据标签中散布 Plot 二进制数据颜色编码点

[英]Scatter Plot Binary Data Color Coded Points from Data Labels

I'd like to use matplotlib.pyplot.scatter to create a scatter plot similar to the picture below from data in a dataframe with a header that is formatted similar to the table here where all the points for a given sample are colored based on the label in the first column of the data and a point is only plotted for each gene with a value of 1 - no point for the genes with a 0 value: I'd like to use matplotlib.pyplot.scatter to create a scatter plot similar to the picture below from data in a dataframe with a header that is formatted similar to the table here where all the points for a given sample are colored based on the label 在数据的第一列中,仅针对值为 1 的每个基因绘制一个点 - 对于值为 0 的基因没有点:

label label gene a基因a gene b b基因 gene c基因 c gene d基因d
1 1 0 0 1 1 0 0 0 0
0 0 1 1 1 1 0 0 1 1
0 0 0 0 0 0 1 1 0 0
1 1 0 0 0 0 0 0 0 0
1 1 0 0 1 1 0 0 0 0

在此处输入图像描述

Note: my sample data does not match my sample scatter plot output.注意:我的样本数据与我的样本散点图 plot output 不匹配。

After melting your dataframe to a long format you can draw a matrix with seaborn'ssns.relplot将 dataframe 融化为长格式后,您可以使用 seaborn 的sns.relplot绘制矩阵

import pandas as pd
import seaborn as sns
sns.set_style("ticks")

df = pd.read_html('https://stackoverflow.com/q/70856944/14277722')[0]
df['sample'] = df.index
df = df.melt(['label','sample'])

g = sns.relplot(
    data=df,
    x="variable", y="sample", hue="label", size="value",
    hue_norm=(-1, 1), palette='tab10',
    height=6, sizes=(10, 300), size_norm=(0, 1)
)
g.set(xlabel="Genes", ylabel="Samples");

矩阵图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM