简体   繁体   English

如何从多级 pivot 表创建堆叠条形图

[英]How to create stacked bar chart from a multi-level pivot table

I've created a pivot table with two levels of columns我创建了一个包含两级列的 pivot 表

pivotCust = bigData.pivot_table(index=['month'],columns=['year','usertype'],values='start_time',aggfunc = 'count')

This creates the table that I'm interested in:这将创建我感兴趣的表:

year    2019    2020    2021
usertype    casual  member  casual  member  casual  member
month                       
1   4602    98670   7785    136099  18117   78717
2   2638    93548   12870   126715  10131   39491
3   15923   149688  27825   115593  84033   144463
4   47744   217566  23628   61148   136601  200629
5   81624   285834  86909   113365  256916  274717
6   130218  345177  154718  188287  370681  358914
7   175632  381683  269296  282184  442056  380354
8   186889  403295  289661  332700  412671  391681
9   129173  364046  230692  302266  363890  392257
10  71035   300751  145012  243641  257242  373984
11  18729   158447  88099   171617  106929  253049
12  16430   138662  30080   101493  69738   177802

But when I try to turn it into a bar graph (with the code below), it's hard to read, as it creates 72 columns -- six entries per month (casual/member * 3 years), for 12 months.但是当我尝试将它变成条形图(使用下面的代码)时,它很难阅读,因为它创建了 72 列——每月 6 个条目(临时/成员 * 3 年),持续 12 个月。 Graph with six entries per month每月包含六个条目的图表

pivotCust.plot(kind = 'bar',figsize=(17,10))

I'd like to turn this into a stacked graph, with three columns per month (1 per year) and the casual/member data in a stacked bar.我想把它变成一个堆叠图,每月有三列(每年 1 列)和堆叠条中的临时/会员数据。 But when I use the 'stacked = True' flag, I get a graph of 12 columns, with all the data stacked together.但是当我使用 'stacked = True' 标志时,我得到一个 12 列的图表,所有数据都堆叠在一起。

pivotCust.plot(kind = 'bar',stacked = True, figsize=(17,10))

I think.melt or.unstack might be what I need to use to fix this, but I can't figure out how to use it correctly.我认为 .melt 或 .unstack 可能是我需要用来解决这个问题的,但我不知道如何正确使用它。

The answer here suggests that Seaborn might be useful, but, again, I can't figure out how to get it to produce the graph I desire. 这里的答案表明 Seaborn 可能有用,但是,我再次无法弄清楚如何让它生成我想要的图形。

Any suggestions would be greatly appreciated.任何建议将不胜感激。

There might be an easier approach, but I think the difficulty comes from the fact that you want to group your columns by month, stratified by year, and then further stratified by usertype.可能有一种更简单的方法,但我认为困难在于您希望按月对列进行分组,按年份分层,然后按用户类型进一步分层。 Seaborn boxplot makes it easy to stratify by one level using hue , but I don't know how to stratify by 2 levels like you need here. Seaborn boxplot 可以很容易地使用hue进行一层分层,但我不知道如何像您需要的那样按 2 层进行分层。

Instead, as a hack I'm first plotting the sum of both user types, and then plotting just the member values on top.相反,作为 hack,我首先绘制两种用户类型的总和,然后仅在顶部绘制成员值。 I'd argue that a lineplot would be easier to interpret.我认为线图会更容易解释。 I've included one below the code.我在代码下面包含了一个。

I also melted your table to make seaborn happier我还融化了你的桌子,让 seaborn 更快乐

在此处输入图像描述

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import io #just used for reading in the example table

pivotCust = pd.read_csv(io.StringIO("""
1   4602    98670   7785    136099  18117   78717
2   2638    93548   12870   126715  10131   39491
3   15923   149688  27825   115593  84033   144463
4   47744   217566  23628   61148   136601  200629
5   81624   285834  86909   113365  256916  274717
6   130218  345177  154718  188287  370681  358914
7   175632  381683  269296  282184  442056  380354
8   186889  403295  289661  332700  412671  391681
9   129173  364046  230692  302266  363890  392257
10  71035   300751  145012  243641  257242  373984
11  18729   158447  88099   171617  106929  253049
12  16430   138662  30080   101493  69738   177802"""
),delim_whitespace=True, header=None, index_col=0)

pivotCust.index.name = 'month'

pivotCust.columns = pd.MultiIndex.from_product([
    [2019,2020,2021],
    ['casual','member'],
], names=['year','usertype'])


plot_df = pivotCust.reset_index().melt(id_vars='month')
plot_df['casual_member_sum'] = plot_df.groupby(['month','year'])['value'].transform('sum')

fig,ax = plt.subplots()

#Plot the sum of the two categories as background bars
sns.barplot(
    x = 'month',
    y = 'casual_member_sum',
    palette = 'Blues',
    hue = 'Total '+plot_df['year'].astype(str),
    ax = ax,
    data = plot_df,
)

#Plot just the members as foreground bars
sns.barplot(
    x = 'month',
    y = 'value',
    palette = 'Reds',
    hue = 'Member '+plot_df['year'].astype(str),
    ax = ax,
    data = plot_df[plot_df['usertype'].eq('member')],
)

plt.show()
plt.close()

Here's the lineplot approach with seaborn using the same plot_df created above.这是使用上面创建的相同plot_df的 seaborn 的线图方法。 The lineplot is easy to make too线图也很容易制作

在此处输入图像描述

sns.lineplot(
    x = 'month',
    y = 'value',
    hue = 'year',
    style = 'usertype',
    data = plot_df,
)
plt.show()
plt.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM