[英]How to create stacked bar chart from a multi-level pivot table
I've created a pivot table with two levels of columns我创建了一个包含两级列的 pivot 表
pivotCust = bigData.pivot_table(index=['month'],columns=['year','usertype'],values='start_time',aggfunc = 'count')
This creates the table that I'm interested in:这将创建我感兴趣的表:
year 2019 2020 2021
usertype casual member casual member casual member
month
1 4602 98670 7785 136099 18117 78717
2 2638 93548 12870 126715 10131 39491
3 15923 149688 27825 115593 84033 144463
4 47744 217566 23628 61148 136601 200629
5 81624 285834 86909 113365 256916 274717
6 130218 345177 154718 188287 370681 358914
7 175632 381683 269296 282184 442056 380354
8 186889 403295 289661 332700 412671 391681
9 129173 364046 230692 302266 363890 392257
10 71035 300751 145012 243641 257242 373984
11 18729 158447 88099 171617 106929 253049
12 16430 138662 30080 101493 69738 177802
But when I try to turn it into a bar graph (with the code below), it's hard to read, as it creates 72 columns -- six entries per month (casual/member * 3 years), for 12 months.但是当我尝试将它变成条形图(使用下面的代码)时,它很难阅读,因为它创建了 72 列——每月 6 个条目(临时/成员 * 3 年),持续 12 个月。 Graph with six entries per month每月包含六个条目的图表
pivotCust.plot(kind = 'bar',figsize=(17,10))
I'd like to turn this into a stacked graph, with three columns per month (1 per year) and the casual/member data in a stacked bar.我想把它变成一个堆叠图,每月有三列(每年 1 列)和堆叠条中的临时/会员数据。 But when I use the 'stacked = True' flag, I get a graph of 12 columns, with all the data stacked together.但是当我使用 'stacked = True' 标志时,我得到一个 12 列的图表,所有数据都堆叠在一起。
pivotCust.plot(kind = 'bar',stacked = True, figsize=(17,10))
I think.melt or.unstack might be what I need to use to fix this, but I can't figure out how to use it correctly.我认为 .melt 或 .unstack 可能是我需要用来解决这个问题的,但我不知道如何正确使用它。
The answer here suggests that Seaborn might be useful, but, again, I can't figure out how to get it to produce the graph I desire. 这里的答案表明 Seaborn 可能有用,但是,我再次无法弄清楚如何让它生成我想要的图形。
Any suggestions would be greatly appreciated.任何建议将不胜感激。
There might be an easier approach, but I think the difficulty comes from the fact that you want to group your columns by month, stratified by year, and then further stratified by usertype.可能有一种更简单的方法,但我认为困难在于您希望按月对列进行分组,按年份分层,然后按用户类型进一步分层。 Seaborn boxplot makes it easy to stratify by one level using hue
, but I don't know how to stratify by 2 levels like you need here. Seaborn boxplot 可以很容易地使用hue
进行一层分层,但我不知道如何像您需要的那样按 2 层进行分层。
Instead, as a hack I'm first plotting the sum of both user types, and then plotting just the member values on top.相反,作为 hack,我首先绘制两种用户类型的总和,然后仅在顶部绘制成员值。 I'd argue that a lineplot would be easier to interpret.我认为线图会更容易解释。 I've included one below the code.我在代码下面包含了一个。
I also melted your table to make seaborn happier我还融化了你的桌子,让 seaborn 更快乐
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import io #just used for reading in the example table
pivotCust = pd.read_csv(io.StringIO("""
1 4602 98670 7785 136099 18117 78717
2 2638 93548 12870 126715 10131 39491
3 15923 149688 27825 115593 84033 144463
4 47744 217566 23628 61148 136601 200629
5 81624 285834 86909 113365 256916 274717
6 130218 345177 154718 188287 370681 358914
7 175632 381683 269296 282184 442056 380354
8 186889 403295 289661 332700 412671 391681
9 129173 364046 230692 302266 363890 392257
10 71035 300751 145012 243641 257242 373984
11 18729 158447 88099 171617 106929 253049
12 16430 138662 30080 101493 69738 177802"""
),delim_whitespace=True, header=None, index_col=0)
pivotCust.index.name = 'month'
pivotCust.columns = pd.MultiIndex.from_product([
[2019,2020,2021],
['casual','member'],
], names=['year','usertype'])
plot_df = pivotCust.reset_index().melt(id_vars='month')
plot_df['casual_member_sum'] = plot_df.groupby(['month','year'])['value'].transform('sum')
fig,ax = plt.subplots()
#Plot the sum of the two categories as background bars
sns.barplot(
x = 'month',
y = 'casual_member_sum',
palette = 'Blues',
hue = 'Total '+plot_df['year'].astype(str),
ax = ax,
data = plot_df,
)
#Plot just the members as foreground bars
sns.barplot(
x = 'month',
y = 'value',
palette = 'Reds',
hue = 'Member '+plot_df['year'].astype(str),
ax = ax,
data = plot_df[plot_df['usertype'].eq('member')],
)
plt.show()
plt.close()
Here's the lineplot approach with seaborn using the same plot_df
created above.这是使用上面创建的相同plot_df
的 seaborn 的线图方法。 The lineplot is easy to make too线图也很容易制作
sns.lineplot(
x = 'month',
y = 'value',
hue = 'year',
style = 'usertype',
data = plot_df,
)
plt.show()
plt.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.