![](/img/trans.png)
[英]Stacked barplot with two categorical variables from dataframe, Python
[英]How to create a 100% stacked barplot from a categorical dataframe
我有一个结构如下的数据框:
用户 | 食物 1 | 食物 2 | 食物 3 | 食物 4 |
---|---|---|---|---|
斯蒂芬 | 洋葱 | 番茄 | 卷心菜 | 土豆 |
汤姆 | 土豆 | 番茄 | 土豆 | 土豆 |
弗雷德 | 萝卜 | 卷心菜 | 茄子 | |
菲尔 | 洋葱 | 茄子 | 茄子 |
我想将食物列中的不同值用作类别。 然后我想创建一个 Seaborn 图,以便将每列的每个类别的百分比绘制为 100% 水平堆叠条。
我尝试这样做:
data = {
'User' : ['Steph', 'Tom', 'Fred', 'Phil'],
'Food 1' : ["Onions", "Potatoes", "Carrots", "Onions"],
'Food 2' : ['Tomatoes', 'Tomatoes', 'Cabbages', 'Eggplant'],
'Food 3' : ["Cabbages", "Potatoes", "", "Eggplant"],
'Food 4' : ['Potatoes', 'Potatoes', 'Eggplant', ''],
}
df = pd.DataFrame(data)
x_ax = ["Onions", "Potatoes", "Carrots", "Onions", "", 'Eggplant', "Cabbages"]
df.plot(kind="barh", x=x_ax, y=["Food 1", "Food 2", "Food 3", "Food 4"], stacked=True, ax=axes[1])
plt.show()
''
替换为np.nan
因为空字符串将被计为值。pandas.DataFrame.melt
将数据帧转换为长格式。pandas.crosstab
获取频率计数表pandas.DataFrame.plot
和kind='barh'
barh' 绘制数据框。
seaborn
是matplotlib
的高级 API, pandas
使用matplotlib
作为默认后端,使用pandas
更容易生成堆积条形图。
seaborn
不支持堆叠条形图,除非histplot
以黑客方式使用,如this answer所示,并且需要额外的熔化percent
步骤。python 3.10
、 pandas 1.4.2
、 matplotlib 3.5.1
中测试
:=
) 需要python >= 3.8
。 否则,使用[f'{v.get_width():.2f}%' if v.get_width() > 0 else '' for v in c ]
。import pandas as pd
import numpy as np
# using the dataframe in the OP
# 1.
df = df.replace('', np.nan)
# 2.
dfm = df.melt(id_vars='User', var_name='Food', value_name='Type')
# 3.
ct = pd.crosstab(dfm.Food, dfm.Type)
# 4.
total = ct.sum(axis=1)
# 5.
percent = ct.div(total, axis=0).mul(100).round(2)
# 6.
ax = percent.plot(kind='barh', stacked=True, figsize=(8, 6))
# 7.
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{w:.2f}%' if (w := v.get_width()) > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center')
# 8.
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
dfm
User Food Type
0 Steph Food 1 Onions
1 Tom Food 1 Potatoes
2 Fred Food 1 Carrots
3 Phil Food 1 Onions
4 Steph Food 2 Tomatoes
5 Tom Food 2 Tomatoes
6 Fred Food 2 Cabbages
7 Phil Food 2 Eggplant
8 Steph Food 3 Cabbages
9 Tom Food 3 Potatoes
10 Fred Food 3 NaN
11 Phil Food 3 Eggplant
12 Steph Food 4 Potatoes
13 Tom Food 4 Potatoes
14 Fred Food 4 Eggplant
15 Phil Food 4 NaN
ct
Type Cabbages Carrots Eggplant Onions Potatoes Tomatoes
Food
Food 1 0 1 0 2 1 0
Food 2 1 0 1 0 0 2
Food 3 1 0 1 0 1 0
Food 4 0 0 1 0 2 0
total
Food
Food 1 4
Food 2 4
Food 3 3
Food 4 3
dtype: int64
percent
Type Cabbages Carrots Eggplant Onions Potatoes Tomatoes
Food
Food 1 0.00 25.0 0.00 50.0 25.00 0.0
Food 2 25.00 0.0 25.00 0.0 0.00 50.0
Food 3 33.33 0.0 33.33 0.0 33.33 0.0
Food 4 0.00 0.0 33.33 0.0 66.67 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.