[英]Pandas group by time interval (5min, 10min, 1day, 1year) and count amount of entries
I have a simple pandas dataframe with around 10000 to 20000 entries for each day.我有一个简单的 pandas dataframe ,每天大约有 10000 到 20000 个条目。 This dataframe contains a point and a datetime (datetime64).
这个 dataframe 包含一个点和一个日期时间 (datetime64)。 For example, it looks like this:
例如,它看起来像这样:
point timestamp_local
0 A 2018-09-29 00:00:20
1 A 2018-09-29 00:04:34
2 A 2018-09-29 00:06:59
3 B 2018-09-29 00:11:09
4 B 2018-09-29 01:19:28
... ... ...
24282 B 2018-09-29 21:40:26
24283 C 2018-09-29 21:40:31
24284 C 2018-09-29 21:45:17
24285 A 2018-09-29 22:20:29
24286 B 2018-09-29 22:28:08
What I now what to get is a dataframe which groups the dataframe above by point and a interval I want to specify and also counts the amount of entries for each point of the interval.我现在得到的是一个 dataframe ,它按点和我要指定的间隔对上面的 dataframe 进行分组,并计算间隔中每个点的条目数量。 Also the interval should be for example a 5 min.
此外,间隔应该是例如 5 分钟。 interval, a 10 min.
间隔,10分钟。 interval or also 1 interval on a daily, monthly or yearly base.
间隔或每天、每月或每年的 1 个间隔。
This is what I got so far to segment the interval:这是我到目前为止分割间隔的内容:
df['10min_period'] = df.apply(lambda x: "period_%d"%(int(x[1].minute/10) + 1), axis=1)
This returns:这将返回:
point timestamp_local 10min_period
0 A 2018-09-29 00:00:20 period_1
1 B 2018-09-29 00:04:34 period_1
2 B 2018-09-29 00:06:59 period_1
3 C 2018-09-29 00:11:09 period_2
4 C 2018-09-29 01:19:28 period_2
And this counts the periods:这计算了周期:
df = df.groupby([df['point'], df['10min_period']]).agg(['count'])
This returns the following dataframe:这将返回以下 dataframe:
timestamp_local
point 10min_period count
A period_1 2092
period_2 2437
period_3 2181
period_4 2525
period_5 2325
period_6 2317
B period_1 1814
period_2 1719
period_3 1732
period_4 1575
period_5 1789
period_6 1781
... ... ...
But this is not exactly what I want.但这并不是我想要的。 The reason for this is that the period row entries are wrong.
其原因是期间行条目错误。 My code has segmented the periods in 10 minute intervals independent from the year, month, date and hour.
我的代码以 10 分钟为间隔将期间分段,独立于年、月、日和小时。 That is exactly what I don't want!
这正是我不想要的!
I want to have a dateframe which segmented by an interval I have specified, eg 5 min.我想要一个按我指定的间隔分段的日期帧,例如 5 分钟。 , 10 min., 1 day, 1 year and so on but considers the year, month, day, hour and minute!
, 10 min., 1 day, 1 year 等等,但考虑年、月、日、小时和分钟! (Take a look on how the periods are named!)
(看看这些时期是如何命名的!)
I give you an example of what I want:
point timestamp_local 10min_period
0 A 2018-09-29 00:00:20 period_2018-09-29_00:00:00
1 B 2018-09-29 00:04:34 period_2018-09-29_00:00:00
2 B 2018-09-29 00:06:59 period_2018-09-29_00:00:00
3 C 2018-09-29 00:11:09 period_2018-09-29_00:10:00
4 C 2018-09-29 00:19:28 period_2018-09-29_00:10:00
5 A 2018-09-29 00:00:20 period_2018-09-29_00:00:00
6 B 2018-09-30 01:04:34 period_2018-09-30_01:00:00
7 B 2018-09-30 00:06:59 period_2018-09-30_00:00:00
8 C 2018-10-29 02:15:09 period_2018-10-29_02:15:00
9 C 2019-09-29 01:19:28 period_2019-09-29_01:10:00
Its very imported to name the period that way so I know to which day and interval the entry contains.以这种方式命名期间非常重要,因此我知道该条目包含的日期和时间间隔。 How can I do this?
我怎样才能做到这一点? And for exmaple if it would have been a 5 minute interval the period should be named like
period_2018-09-29_00:00:00
, period_2018-09-29_00:05:00
and period_2018-09-29_00:25:00
and so on and so on.例如,如果间隔为 5 分钟,则周期应命名为
period_2018-09-29_00:00:00
、 period_2018-09-29_00:05:00
和period_2018-09-29_00:25:00
等等等。
Thank you very much!非常感谢!
Are you looking for something like this, for minute intervals:您是否正在寻找这样的东西,以分钟为间隔:
df.groupby(['point',df.timestamp_local.dt.floor('5Min')]).size()
and this, for month/year这个,对于月/年
df.groupby(['point', df.timestamp_local.dt.to_period('M')]).size()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.