Python Pandas - 按类别分组，然后按类别 plot

Question

Very easy pandas question, I'm a beginner.非常简单的 pandas 问题，我是初学者。

I have a dataframe 'df' with (for example):我有一个 dataframe 'df' （例如）：

import pandas as pd
df = pd.DataFrame({'time': ['2019-04-23 10:21:00', '2019-04-23 11:14:00', '2019-04-24 11:30'], 
                   'category': ['A', 'B', 'A'],
                   'text': ['njrnfrjn','fmrjfmrfmr','mjrnfjrnmi']})

I just want to:我只想：

Group by category and dates (daily)按类别和日期分组（每天）
Count the number of text message by category and day按类别和日期统计短信数量
Plot all timeseries across days (one timeseries for each category in the same plot) Plot 跨天的所有时间序列（同一图中每个类别一个时间序列）

Thanks谢谢

Answer 1

You can try the following:您可以尝试以下方法：

df.groupby([df.time.dt.floor('d'), "category"]).size().unstack().plot()

Explanations :说明：

First step is to grouby as you mentioned.第一步是像你提到的那样发牢骚。 To do this, we use groupby为此，我们使用groupby
In the groupby , because we need to group the times by days, one solution is to use dt.floor on the time column.在groupby中，因为我们需要按天对times进行分组，所以一种解决方案是在time列上使用dt.floor 。 We pass the argument "d" for days .我们将参数"d"传递了days 。
- Also, to be sure the floor is reachable, the time column must be a time series .此外，为确保floor可达， time列必须是time series 。 If it's not, use pd.to_datetime to convert it with pd.to_datetime(df.time) .如果不是，请使用pd.to_datetime将其转换为pd.to_datetime(df.time) 。
Now we have the group, the size can be easily computed applying the size method.现在我们有了组，可以使用size方法轻松计算size 。
The next step is to convert the category column (at this step as index) into columns.下一步是将category列（在此步骤中作为索引）转换为列。 Because we groupby by two keys, we can use unstack .因为我们按两个键unstack 。
Finally, call the plot one the dataframe.最后，将plot称为 dataframe。 Because the dataframe is well structured, it works without any arguments (one line is drawn for each column and the index column ( time ) is used as x-axis .因为 dataframe 结构良好，它可以在没有任何 arguments 的情况下工作（每列绘制一条线，索引列（ time ）用作x 轴。

Full code + illustration :完整代码+插图：

# import modules 
import pandas as pd
import matplotlib.pyplot as plt
# (here random is just for creating dummy data)
from random import randint, choice

# Create dummy data
size = 1000
df = pd.DataFrame({
    'time': pd.to_datetime(["2020/01/{} {}:{}".format(randint(1, 31), randint(0,23), randint(0,59)) for _ in range(size)]),
    'text': ['blablabla...' for _ in range(size)],
    'category': [choice(["A", "B", "C"]) for _ in range(size)]
})
print(df)
#                    time          text category
# 0   2020-01-30 23:15:00  blablabla...        C
# 1   2020-01-16 07:06:00  blablabla...        A
# 2   2020-01-03 18:47:00  blablabla...        A
# 3   2020-01-21 15:45:00  blablabla...        A
# 4   2020-01-10 04:11:00  blablabla...        C
# ..                  ...           ...      ...
# 995 2020-01-12 03:03:00  blablabla...        C
# 996 2020-01-08 10:35:00  blablabla...        B
# 997 2020-01-24 20:51:00  blablabla...        C
# 998 2020-01-05 07:39:00  blablabla...        A
# 999 2020-01-26 16:54:00  blablabla...        A

# See size result
print(df.groupby([df.time.dt.floor('d'), "category"]).size())
# time        category
# 2020-01-01  A            6
#             B           18
#             C            7
# 2020-01-02  A           10
#             B            8
#                         ..
# 2020-01-30  B           16
#             C           11
# 2020-01-31  A           14
#             B           17
#             C           11

# See unstack result
print(df.groupby([df.time.dt.floor('d'), "category"]).size().unstack())
# category     A   B   C
# time
# 2020-01-01   6  18   7
# 2020-01-02  10   8  13
# 2020-01-03  11  11  16
# 2020-01-04   9   5  10
# 2020-01-05  13   9  13
# 2020-01-06  11  11  12
# 2020-01-07  13   7   9
# 2020-01-08   5  16  13
# 2020-01-09  15   6  14
# 2020-01-10  10  11   9
# 2020-01-11   7  16  13
# 2020-01-12  12  13  13
# 2020-01-13  12   5   7
# 2020-01-14  11  10  11
# 2020-01-15  13  14  11
# 2020-01-16   9   8  13
# 2020-01-17   8   9   6
# 2020-01-18  12   5  11
# 2020-01-19   7   8  13
# 2020-01-20  12   9   9
# 2020-01-21   9  13  13
# 2020-01-22  14  11  19
# 2020-01-23  14   6  12
# 2020-01-24   7   8   6
# 2020-01-25  10  12  10
# 2020-01-26   8  12   7
# 2020-01-27  18  11   7
# 2020-01-28  15  10   9
# 2020-01-29  12   7  11
# 2020-01-30  12  16  11
# 2020-01-31  14  17  11

# Perform plot
df.groupby([df.time.dt.floor('d'), "category"]).size().unstack().plot()
plt.show()

output : output ：

Python Pandas - 按类别分组，然后按类别 plot

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-26 17:55:05

Python Pandas - 按类别分组，然后按类别 plot

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-26 17:55:05

解决方案1
1 已采纳 2020-04-26 17:55:05