[英]Pandas group-by date range & different calculations on multiple columns
I'm having troubles with grouping a pandas df by time range and different calculations by column: 我在按时间范围将pandas df分组以及按列进行不同的计算时遇到麻烦:
Let's take following df: 让我们看一下df:
date identifier value_1 value_2
0 05.07.2018 16:35 A 10 0
1 05.07.2018 16:36 B 20 1
2 05.07.2018 16:37 A 20 2
3 05.07.2018 16:39 B 30 1
4 05.07.2018 16:40 A 40 3
5 05.07.2018 16:41 B 20 2
6 05.07.2018 16:41 A 30 1
7 05.07.2018 16:42 B 50 2
8 05.07.2018 16:43 B 20 3
9 05.07.2018 16:44 A 20 1
As a result I need a df, which is grouped by time in 5 mins intervals and by identifier, with the average of value_1 and the sum of value_2: 结果,我需要一个df,它按5分钟间隔的时间和标识符分组,其平均值为value_1和value_2之和:
date identifier value_1 value_2
0 05.07.2018 16:35 A 15 2
1 05.07.2018 16:35 B 25 2
2 05.07.2018 16:40 A 30 5
3 05.07.2018 16:40 B 30 7
How can I do this the most efficient way in pandas? 如何在熊猫中最有效地做到这一点?
THX & BR from Vienna 维也纳的THX和BR
you can use groupby
, pd.Grouper
, and agg
, after setting your date
column to datetime
with the proper format: 在将date
列设置为具有适当格式的datetime
之后,可以使用groupby
, pd.Grouper
和agg
:
# Set date to datetime format. I'm assuming it's day.month.year in your original dataframe
df['date'] = pd.to_datetime(df.date, format = '%d.%m.%Y %H:%M')
new_df = (df.groupby(['identifier', pd.Grouper(key='date', freq='5min')])
.agg({'value_1':'mean', 'value_2':'sum'}))
>>> new_df
value_1 value_2
identifier date
A 2018-07-05 16:35:00 15 2
2018-07-05 16:40:00 30 5
B 2018-07-05 16:35:00 25 2
2018-07-05 16:40:00 30 7
If you want the same format as your desired output in your post, you can use this to sort: 如果您想要与帖子中所需输出相同的格式,则可以使用以下格式进行排序:
new_df.reset_index().sort_values(['date','identifier'])
identifier date value_1 value_2
0 A 2018-07-05 16:35:00 15 2
2 B 2018-07-05 16:35:00 25 2
1 A 2018-07-05 16:40:00 30 5
3 B 2018-07-05 16:40:00 30 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.