简体   繁体   English

如何使用seaborn为每个数据列设置绘图样式来绘制分类数据?

[英]How to plot categorical data with seaborn setting the plot-style for each data column?

Background背景

Let's say I have the following dataset:假设我有以下数据集:

import pandas as pd
import numpy as np

data = ([["Cheese", x] for x in np.random.normal(0.8, 0.03, 10)] + 
        [["Meat", x] for x in np.random.normal(0.4, 0.05, 14)] + 
        [["Bread", 0.8], ["Bread", 0.65]])

df = pd.DataFrame(data, columns=["Food", "Score"])


import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)
sns.set_context("paper")
sns.catplot(x="Score", y="Food", kind="box", data=df)

which yields the following plot (or similar, depending on the generated random numbers):产生以下图(或类似图,取决于生成的随机数):

示例箱线图

The reason I am going for box-plots with my actual data is that individual dots combined with the amount of categories I want to show make the plot visually way too noisy and the boxes give a nice general overview of how the data is distributed which is what I am after.我用我的实际数据绘制箱线图的原因是,单个点与我想要显示的类别数量相结合,使图表在视觉上过于嘈杂,而这些方框给出了数据如何分布的一个很好的总体概述,即我在追求什么。 However, the issue is with categories like the "Bread" category.但是,问题在于“面包”类别之类的类别。

Question

As you can observe, seaborn produced boxes with median, quartiles etc. for all three categories.正如您所观察到的,seaborn 为所有三个类别生成了带有中位数、四分位数等的框。 However, since the category "Bread" does only have two data-points, using a box-plot for this category is not really an appropriate representation.但是,由于类别“面包”只有两个数据点,因此对该类别使用箱线图并不是真正合适的表示。 I would much rather have this category only as individual dots.我更愿意将此类别仅作为单个点。

But when looking at the examples on the https://seaborn.pydata.org/tutorial/categorical.html , the only suggestion for combining box-plots and simple dots is to plot both for all categories which is not what I am after.但是,当查看https://seaborn.pydata.org/tutorial/categorical.html上的示例时,结合箱线图和简单点的唯一建议是为所有类别绘制两者,这不是我所追求的。

In short: How do I plot categorical data with seaborn, selecting the appropriate representation for each category?简而言之:如何使用 seaborn 绘制分类数据,为每个类别选择适当的表示?

Maybe try creating df for bread and not bread:也许尝试为面包而不是面包创建 df :

dfb = df[df['Food'].notnull() & (df['Food'] == 'Bread')]
dfnot_b = df[df['Food'].notnull() & (df['Food'] != 'Bread')]

then add another axis:然后添加另一个轴:

fig, ax = plt.subplots()
ax2 = ax.twinx()

try different plots:尝试不同的情节:

sns.boxplot(x="Score", y="Food", data=dfnot_b, ax=ax)
sns.stripplot(x="Score", y="Food", data=dfb, ax=ax2)

情节叠加

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM