[英]Python Pandas - Group by, then plot by category
Very easy pandas question, I'm a beginner.非常简单的 pandas 问题,我是初学者。
I have a dataframe 'df' with (for example):我有一个 dataframe 'df' (例如):
import pandas as pd
df = pd.DataFrame({'time': ['2019-04-23 10:21:00', '2019-04-23 11:14:00', '2019-04-24 11:30'],
'category': ['A', 'B', 'A'],
'text': ['njrnfrjn','fmrjfmrfmr','mjrnfjrnmi']})
I just want to:我只想:
Thanks谢谢
You can try the following:您可以尝试以下方法:
df.groupby([df.time.dt.floor('d'), "category"]).size().unstack().plot()
Explanations :说明:
groupby
为此,我们使用groupby
In the groupby
, because we need to group the times
by days, one solution is to use dt.floor
on the time
column.在groupby
中,因为我们需要按天对times
进行分组,所以一种解决方案是在time
列上使用dt.floor
。 We pass the argument "d"
for days
.我们将参数"d"
传递了days
。
floor
is reachable, the time
column must be a time series
.此外,为确保floor
可达, time
列必须是time series
。 If it's not, use pd.to_datetime
to convert it with pd.to_datetime(df.time)
.如果不是,请使用pd.to_datetime
将其转换为pd.to_datetime(df.time)
。 Now we have the group, the size can be easily computed applying the size
method.现在我们有了组,可以使用size方法轻松计算size
。
The next step is to convert the category
column (at this step as index) into columns.下一步是将category
列(在此步骤中作为索引)转换为列。 Because we groupby by two keys, we can use unstack
.因为我们按两个键unstack
。
Finally, call the plot
one the dataframe.最后,将plot
称为 dataframe。 Because the dataframe is well structured, it works without any arguments (one line is drawn for each column and the index column ( time
) is used as x-axis .因为 dataframe 结构良好,它可以在没有任何 arguments 的情况下工作(每列绘制一条线,索引列( time
)用作x 轴。
Full code + illustration :完整代码+插图:
# import modules
import pandas as pd
import matplotlib.pyplot as plt
# (here random is just for creating dummy data)
from random import randint, choice
# Create dummy data
size = 1000
df = pd.DataFrame({
'time': pd.to_datetime(["2020/01/{} {}:{}".format(randint(1, 31), randint(0,23), randint(0,59)) for _ in range(size)]),
'text': ['blablabla...' for _ in range(size)],
'category': [choice(["A", "B", "C"]) for _ in range(size)]
})
print(df)
# time text category
# 0 2020-01-30 23:15:00 blablabla... C
# 1 2020-01-16 07:06:00 blablabla... A
# 2 2020-01-03 18:47:00 blablabla... A
# 3 2020-01-21 15:45:00 blablabla... A
# 4 2020-01-10 04:11:00 blablabla... C
# .. ... ... ...
# 995 2020-01-12 03:03:00 blablabla... C
# 996 2020-01-08 10:35:00 blablabla... B
# 997 2020-01-24 20:51:00 blablabla... C
# 998 2020-01-05 07:39:00 blablabla... A
# 999 2020-01-26 16:54:00 blablabla... A
# See size result
print(df.groupby([df.time.dt.floor('d'), "category"]).size())
# time category
# 2020-01-01 A 6
# B 18
# C 7
# 2020-01-02 A 10
# B 8
# ..
# 2020-01-30 B 16
# C 11
# 2020-01-31 A 14
# B 17
# C 11
# See unstack result
print(df.groupby([df.time.dt.floor('d'), "category"]).size().unstack())
# category A B C
# time
# 2020-01-01 6 18 7
# 2020-01-02 10 8 13
# 2020-01-03 11 11 16
# 2020-01-04 9 5 10
# 2020-01-05 13 9 13
# 2020-01-06 11 11 12
# 2020-01-07 13 7 9
# 2020-01-08 5 16 13
# 2020-01-09 15 6 14
# 2020-01-10 10 11 9
# 2020-01-11 7 16 13
# 2020-01-12 12 13 13
# 2020-01-13 12 5 7
# 2020-01-14 11 10 11
# 2020-01-15 13 14 11
# 2020-01-16 9 8 13
# 2020-01-17 8 9 6
# 2020-01-18 12 5 11
# 2020-01-19 7 8 13
# 2020-01-20 12 9 9
# 2020-01-21 9 13 13
# 2020-01-22 14 11 19
# 2020-01-23 14 6 12
# 2020-01-24 7 8 6
# 2020-01-25 10 12 10
# 2020-01-26 8 12 7
# 2020-01-27 18 11 7
# 2020-01-28 15 10 9
# 2020-01-29 12 7 11
# 2020-01-30 12 16 11
# 2020-01-31 14 17 11
# Perform plot
df.groupby([df.time.dt.floor('d'), "category"]).size().unstack().plot()
plt.show()
output : output :
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.