带有来自 dataframe、Python 的两个分类变量的堆叠条形图

Question

我一直试图想象这个 plot 一段时间，但无济于事。 我想制作一个堆积条形图，其中条形图的高度由吃免费/减少午餐的学生百分比（每行）决定（二进制变量），条形图的 position 基于分类变量（edu.团体）。 每个条形的堆叠元素由另一个分类变量（种族/民族）确定。

因此，我希望酒吧看起来像这样：

Answer 1

以下是将提供您正在寻找的图表的数据和代码。 完成的步骤是：

计算那些使用free/reduced午餐的百分比。 这是针对每个教育组计算的，并存储在percs
totals列具有每个教育组的计数，将用于识别标签的 position。 请注意，此计数仅适用于使用free/reduced午餐选项的人
需要对数据进行一些整合。 dataframe 首先由使用free/reduced午餐的人过滤，因为这就是您要寻找的 plot。
然后它由教育组使用groupby()分组，然后是种族列和。 然后是unstacked() 。
最后，列（午餐）的顶层被删除，以便图例更清晰。 自己尝试分组和拆散部分，看看数据如何更改以适合图表
绘制此数据
最后，使用totals和percs ，在每个堆叠条的顶部添加 label 以显示百分比。 我使用了 1 个小数位，但您可以根据需要调整它

注意：我使用的是 python 3.8.8和 matplotlib 3.3.4 。 如果您有 matplotlib 3.4.2或更高版本，则可以使用 matplotlib 的bar_label() ，这可能会减少绘制文本的工作量。

希望这是您正在寻找的。

我的资料

>> df
    parental edu. group race/ethnicity  lunch
0   College edu Group A free/reduced
1   College edu Group A free/reduced
2   College edu Group A free/reduced
3   College edu Group B free/reduced
4   College edu Group B free/reduced
5   College edu Group B standard
6   College edu Group B standard
7   College edu Group A standard
8   College edu Group A standard
9   College edu Group A standard
10  High School Group A free/reduced
11  High School Group A free/reduced
12  High School Group B free/reduced
13  High School Group A standard
14  High School Group B standard
15  No edu  Group B standard
16  No edu  Group A standard
17  No edu  Group B free/reduced
18  No edu  Group A free/reduced

代码

percs=[] ##To store percentages
totals=[] ##To store totals
#Update the totals and percentages for each education-group
for ch in df['parental edu. group'].unique():
    percs.append(round(len(df[(df['lunch']=='free/reduced') & (df['parental edu. group'] == ch)])/len(df[df['parental edu. group'] == ch])*100,1))
    totals.append(len(df[(df['lunch']=='free/reduced') & (df['parental edu. group'] == ch)]))

# Group data, count the lunch column and unstack it
df=df[df['lunch']=='free/reduced'].groupby(['parental edu. group', 'race/ethnicity']).count().unstack('race/ethnicity')

#Drop the top level - lunch
df.columns = df.columns.droplevel()

#Plot the graph
fig, ax=plt.subplots(figsize=(8,6))
df.plot.bar(stacked=True, rot=0, ax=ax)

# Add labels to each stacked bar. 
# Note that the position is found from totals list and text is from percs list
for i, total in enumerate(totals):
    print(i, totals[i], percs[i]*100)
    ax.text(i, total + 0.1, str(percs[i])+"%", ha='center', weight='bold')

带有来自 dataframe、Python 的两个分类变量的堆叠条形图

问题描述

1 个解决方案

解决方案1
0 2022-09-15 08:45:47

带有来自 dataframe、Python 的两个分类变量的堆叠条形图

问题描述

1 个解决方案

解决方案1 0 2022-09-15 08:45:47

解决方案1
0 2022-09-15 08:45:47