[英]Plot group bar charts with matplotlib or Seaborn with Datetime Index in Python
I have a Pandas DataFrame that consists of a date column and a category column of interest.我有一个 Pandas DataFrame,它由一个日期列和一个感兴趣的类别列组成。 I would like to see the Frequency count for each month.
我想查看每个月的频率计数。 When I did this with matplotlib, I get something that looks quite bad.
当我用 matplotlib 做这件事时,我得到了一些看起来很糟糕的东西。
Here is what the frame looks like when grouped by the months:以下是按月份分组时框架的外观:
df.resample("M")["category_col"].value_counts(normalize=True).mul(100)
Output
date category_col
2019-12-31 A 41.929004
B 25.758765
C 17.752111
D 9.189919
E 3.625122
F 1.745080
2020-01-31 A 54.052744
C 16.347271
B 14.414431
D 11.677537
E 2.675607
F 0.832411
2020-02-29 A 48.928468
D 22.011116
C 14.084507
C 11.729162
E 2.193272
F 1.053475
2020-03-31 A 54.435410
D 15.718065
C 14.577060
B 11.335682
E 2.884205
F 1.049578
Name: category_col, dtype: float64
Here what my attempt这是我的尝试
df.date = pd.to_datetime(df.date)
df.set_index("date", inplace=True)
df.resample("M")["category_col"].value_counts(normalize=True).mul(100).plot(kind="bar")
See the output below:请参阅下面的输出:
Here is what I want:这是我想要的:
I think you need Series.unstack
with rename
for rormat of datetimes month name year
:我认为你需要
Series.unstack
rename
日期month name year
rormat :
df.date = pd.to_datetime(df.date)
df = df.set_index("date")
s = df.resample("M")["category_col"].value_counts(normalize=True).mul(100)
s.unstack().rename(lambda x: x.strftime('%B %Y')).plot(kind="bar")
Sample:样本:
print (s)
date category_col
2019-12-31 A 41.929004
B 25.758765
C 17.752111
D 9.189919
E 3.625122
F 1.745080
2020-01-31 A 54.052744
C 16.347271
B 14.414431
D 11.677537
E 2.675607
F 0.832411
2020-02-29 A 48.928468
B 22.011116
C 14.084507
D 11.729162
E 2.193272
F 1.053475
2020-03-31 A 54.435410
D 15.718065
C 14.577060
B 11.335682
E 2.884205
F 1.049578
Name: A, dtype: float64
print (s.unstack())
category_col A B C D E F
date
2019-12-31 41.929004 25.758765 17.752111 9.189919 3.625122 1.745080
2020-01-31 54.052744 14.414431 16.347271 11.677537 2.675607 0.832411
2020-02-29 48.928468 22.011116 14.084507 11.729162 2.193272 1.053475
2020-03-31 54.435410 11.335682 14.577060 15.718065 2.884205 1.049578
print (s.unstack().rename(lambda x: x.strftime('%B %Y')))
category_col A B C D E F
date
December 2019 41.929004 25.758765 17.752111 9.189919 3.625122 1.745080
January 2020 54.052744 14.414431 16.347271 11.677537 2.675607 0.832411
February 2020 48.928468 22.011116 14.084507 11.729162 2.193272 1.053475
March 2020 54.435410 11.335682 14.577060 15.718065 2.884205 1.049578
First of all, to get the name of the months, reset the index and select the right columns:首先,要获取月份的名称,请重置索引并选择正确的列:
df['month'] = df['date'].apply(lambda x: pd.Timestamp(x).strftime('%B'))
df = df.reset_index()
df = df[['month','category_col','value]]
Then, assuming that you have a dataframe (called df ) like this:然后,假设您有一个像这样的数据帧(称为df ):
month category_col value
September A 41.929004
September B 25.758765
Perform the following to get the plot you are looking for, using Seaborn:使用 Seaborn 执行以下操作以获取您要查找的图:
import seaborn as sns
ax = sns.barplot(x="month", y="value", hue="category_col", data=df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.