[英]Plot categorical scatterplot in seaborn or matplotlib
I have the following dataframe我有以下数据框
it, A B C D
0 10, aa mn cd kk
1 100, ab cd wc ll
2 1000, wc cd mn sf
3 10000, ll ll kk mn
4 100000, wc kk mn cd
5 1000000, aa ll we sf
6 10000000, ss aa ss kk
created as创建为
options = ["ab", "cd", "bb", "aa", "we", "ss", "kk", "mn", "re", "wc", "ll", "sf"]
df = pd.DataFrame(columns=["A", "B", "C", "D"])
for i, it in enumerate([1,2,3,4,5,6,7]):
row = [10**i, random.sample(options, 1)[0], random.sample(options, 1)[0],
random.sample(options, 1)[0], random.sample(options, 1)[0]]
df.loc[i] = row
The goal is to create a scatterplot where y axis are unique values from a dataframe in sorted order eg options and a-axis corresponds to column it
.目标是创建一个散点图,其中 y 轴是按排序顺序来自数据帧的唯一值,例如 options 和 a 轴对应于列
it
。 Now depending on whether data belongs to column A, B, C,
or D
I want to color scatter-dots differently and specify a legend.现在,根据数据是否属于
A, B, C,
或D
列A, B, C,
我想对散点进行不同的着色并指定一个图例。 So I know what class a dot comes from.所以我知道一个点来自哪个类。
How do I do it in seaborn or matplotlib?我如何在 seaborn 或 matplotlib 中做到这一点?
The way I am doing it in matplotlib is我在 matplotlib 中这样做的方式是
iters = list(range(df.shape[0]))
x, y = sort(iters, df["A"])
plt.scatter(x, y, color="red")
x, y = sort(iters, df["B"])
plt.scatter(x, y, color="blue")
...
but that does not sort the entire y-axis, only labels that belong to separate columns.但这不会对整个 y 轴进行排序,只会对属于单独列的标签进行排序。
Let's try stack the data, convert to categorical with given order, sort and plot:让我们尝试堆叠数据,转换为具有给定顺序的分类,排序和绘图:
s = df.stack()
s = pd.Series(pd.Categorical(s, categories=options, ordered=True),
index=s.index)
sns.scatterplot(data=s.sort_values().reset_index(name='value'),
x='level_0', y='value', hue='level_1'
)
Output:输出:
Update : if you have a column xvalue
and only care for some columns ['A','B','C','D']
, use melt
instead of stack
:更新:如果您有一个列
xvalue
并且只关心某些列['A','B','C','D']
,请使用melt
而不是stack
:
s = df.melt(id_vars='xvalue',
value_vars=['A','B','C','D'],
value_name='value',
var_name='column')
s['value'] = pd.Categorical(s['value'], categories=options, ordered=True)
sns.scatterplot(data=s.sort_values('value'),
x='xvalue', y='value', hue='column'
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.