[英]How to aggregate data in hour based on timestamp in pandas?
I have a dataframe a day full from 00:00:00 to 23:59:59 the table bellow is just and example, I can't paste it here because it's too long.我每天从 00:00:00 到 23:59:59 有一个 dataframe 下表只是示例,我无法将其粘贴在这里,因为它太长了。
id sm_log_time score 1 score 2
0 2020-04-15 15:25:49 10 10
1 2020-04-15 15:38:55 10 10
2 2020-04-15 15:52:01 10 10
3 2020-04-15 16:05:07 10 10
4 2020-04-15 16:18:13 10 10
And my desired dataframe is something like this.而我想要的 dataframe 就是这样的。 Score 1 and score 2 is sum based on minutes in an hour
分数 1 和分数 2 是基于一小时中的分钟数的总和
id sm_log_time score 1 score 2
0 2020-04-15 15:00:00 100 200
1 2020-04-15 16:00:00 230 200
2 2020-04-15 17:00:00 200 300
3 2020-04-15 18:00:00 100 300
4 2020-04-15 19:00:00 100 300
Someone give me this for reference:有人给我这个供参考:
times = pd.to_datetime(df.timestamp_col)
df.groupby([times.hour, times.minute]).value_col.sum()
First setting index is necessary.首先设置索引是必要的。 Then use
resample
method of time series index:然后使用时间序列索引的
resample
方法:
df.set_index('sm_log_time').resample('H').sum().reset_index()
Result:结果:
sm_log_time id score 1 score 2
0 2020-04-15 15:00:00 3 30 30
1 2020-04-15 16:00:00 7 20 20
Please note also id
is summed, You may drop it if not necessary.请注意,
id
也是求和的,如果没有必要,您可以删除它。 New row number of resulting dataframe is now in index.结果 dataframe 的新行号现在在索引中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.