简体   繁体   English

在 Python 中使用 matplotlib 或 Seaborn 使用日期时间索引绘制组条形图

[英]Plot group bar charts with matplotlib or Seaborn with Datetime Index in Python

I have a Pandas DataFrame that consists of a date column and a category column of interest.我有一个 Pandas DataFrame,它由一个日期列和一个感兴趣的类别列组成。 I would like to see the Frequency count for each month.我想查看每个月的频率计数。 When I did this with matplotlib, I get something that looks quite bad.当我用 matplotlib 做这件事时,我得到了一些看起来很糟糕的东西。

Here is what the frame looks like when grouped by the months:以下是按月份分组时框架的外观:

df.resample("M")["category_col"].value_counts(normalize=True).mul(100)

Output

date                         category_col      
2019-12-31  A                41.929004
            B                25.758765
            C                17.752111
            D                9.189919
            E                3.625122
            F                1.745080
2020-01-31  A                54.052744
            C                16.347271
            B                14.414431
            D                11.677537
            E                2.675607
            F                0.832411
2020-02-29  A                48.928468
            D                22.011116
            C                14.084507
            C                11.729162
            E                2.193272
            F                1.053475
2020-03-31  A                54.435410
            D                15.718065
            C                14.577060
            B                11.335682
            E                2.884205
            F                1.049578
Name: category_col, dtype: float64

Here what my attempt这是我的尝试

df.date = pd.to_datetime(df.date)
df.set_index("date", inplace=True)
df.resample("M")["category_col"].value_counts(normalize=True).mul(100).plot(kind="bar")

See the output below:请参阅下面的输出:

在此处输入图片说明

Here is what I want:这是我想要的:

在此处输入图片说明

I think you need Series.unstack with rename for rormat of datetimes month name year :我认为你需要Series.unstack rename日期month name year rormat :

df.date = pd.to_datetime(df.date)
df = df.set_index("date")

s = df.resample("M")["category_col"].value_counts(normalize=True).mul(100)

s.unstack().rename(lambda x: x.strftime('%B %Y')).plot(kind="bar")

Sample:样本:

print (s)
date        category_col
2019-12-31  A               41.929004
            B               25.758765
            C               17.752111
            D                9.189919
            E                3.625122
            F                1.745080
2020-01-31  A               54.052744
            C               16.347271
            B               14.414431
            D               11.677537
            E                2.675607
            F                0.832411
2020-02-29  A               48.928468
            B               22.011116
            C               14.084507
            D               11.729162
            E                2.193272
            F                1.053475
2020-03-31  A               54.435410
            D               15.718065
            C               14.577060
            B               11.335682
            E                2.884205
            F                1.049578
Name: A, dtype: float64

print (s.unstack())
category_col          A          B          C          D         E         F
date                                                                        
2019-12-31    41.929004  25.758765  17.752111   9.189919  3.625122  1.745080
2020-01-31    54.052744  14.414431  16.347271  11.677537  2.675607  0.832411
2020-02-29    48.928468  22.011116  14.084507  11.729162  2.193272  1.053475
2020-03-31    54.435410  11.335682  14.577060  15.718065  2.884205  1.049578

print (s.unstack().rename(lambda x: x.strftime('%B %Y')))
category_col           A          B          C          D         E         F
date                                                                         
December 2019  41.929004  25.758765  17.752111   9.189919  3.625122  1.745080
January 2020   54.052744  14.414431  16.347271  11.677537  2.675607  0.832411
February 2020  48.928468  22.011116  14.084507  11.729162  2.193272  1.053475
March 2020     54.435410  11.335682  14.577060  15.718065  2.884205  1.049578

First of all, to get the name of the months, reset the index and select the right columns:首先,要获取月份的名称,请重置索引并选择正确的列:

df['month'] = df['date'].apply(lambda x: pd.Timestamp(x).strftime('%B'))

df = df.reset_index()

df = df[['month','category_col','value]]

Then, assuming that you have a dataframe (called df ) like this:然后,假设您有一个像这样的数据帧(称为df ):

month       category_col     value      
September   A                41.929004
September   B                25.758765

Perform the following to get the plot you are looking for, using Seaborn:使用 Seaborn 执行以下操作以获取您要查找的图:

import seaborn as sns 
ax = sns.barplot(x="month", y="value", hue="category_col", data=df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM