简体   繁体   English

Pandas 按时间间隔(5 分钟、10 分钟、1 天、1 年)和条目计数分组

[英]Pandas group by time interval (5min, 10min, 1day, 1year) and count amount of entries

I have a simple pandas dataframe with around 10000 to 20000 entries for each day.我有一个简单的 pandas dataframe ,每天大约有 10000 到 20000 个条目。 This dataframe contains a point and a datetime (datetime64).这个 dataframe 包含一个点和一个日期时间 (datetime64)。 For example, it looks like this:例如,它看起来像这样:

        point   timestamp_local
0       A       2018-09-29 00:00:20
1       A       2018-09-29 00:04:34
2       A       2018-09-29 00:06:59
3       B       2018-09-29 00:11:09
4       B       2018-09-29 01:19:28
...     ...     ...
24282   B       2018-09-29 21:40:26
24283   C       2018-09-29 21:40:31
24284   C       2018-09-29 21:45:17
24285   A       2018-09-29 22:20:29
24286   B       2018-09-29 22:28:08

What I now what to get is a dataframe which groups the dataframe above by point and a interval I want to specify and also counts the amount of entries for each point of the interval.我现在得到的是一个 dataframe ,它按点和我要指定的间隔对上面的 dataframe 进行分组,并计算间隔中每个点的条目数量。 Also the interval should be for example a 5 min.此外,间隔应该是例如 5 分钟。 interval, a 10 min.间隔,10分钟。 interval or also 1 interval on a daily, monthly or yearly base.间隔或每天、每月或每年的 1 个间隔。

This is what I got so far to segment the interval:这是我到目前为止分割间隔的内容:

df['10min_period'] = df.apply(lambda x: "period_%d"%(int(x[1].minute/10) + 1), axis=1)

This returns:这将返回:

    point   timestamp_local         10min_period
0   A       2018-09-29 00:00:20     period_1
1   B       2018-09-29 00:04:34     period_1
2   B       2018-09-29 00:06:59     period_1
3   C       2018-09-29 00:11:09     period_2
4   C       2018-09-29 01:19:28     period_2

And this counts the periods:这计算了周期:

df = df.groupby([df['point'], df['10min_period']]).agg(['count'])

This returns the following dataframe:这将返回以下 dataframe:

                           timestamp_local
point   10min_period       count
A       period_1           2092
        period_2           2437
        period_3           2181
        period_4           2525
        period_5           2325
        period_6           2317
B       period_1           1814
        period_2           1719
        period_3           1732
        period_4           1575
        period_5           1789
        period_6           1781
...     ...                ...

But this is not exactly what I want.但这并不是我想要的。 The reason for this is that the period row entries are wrong.其原因是期间行条目错误。 My code has segmented the periods in 10 minute intervals independent from the year, month, date and hour.我的代码以 10 分钟为间隔将期间分段,独立于年、月、日和小时。 That is exactly what I don't want!这正是我不想要的!

I want to have a dateframe which segmented by an interval I have specified, eg 5 min.我想要一个按我指定的间隔分段的日期帧,例如 5 分钟。 , 10 min., 1 day, 1 year and so on but considers the year, month, day, hour and minute! , 10 min., 1 day, 1 year 等等,但考虑年、月、日、小时和分钟! (Take a look on how the periods are named!) (看看这些时期是如何命名的!)

I give you an example of what I want:
        point   timestamp_local         10min_period
    0   A       2018-09-29 00:00:20     period_2018-09-29_00:00:00
    1   B       2018-09-29 00:04:34     period_2018-09-29_00:00:00
    2   B       2018-09-29 00:06:59     period_2018-09-29_00:00:00
    3   C       2018-09-29 00:11:09     period_2018-09-29_00:10:00
    4   C       2018-09-29 00:19:28     period_2018-09-29_00:10:00
    5   A       2018-09-29 00:00:20     period_2018-09-29_00:00:00
    6   B       2018-09-30 01:04:34     period_2018-09-30_01:00:00
    7   B       2018-09-30 00:06:59     period_2018-09-30_00:00:00
    8   C       2018-10-29 02:15:09     period_2018-10-29_02:15:00
    9   C       2019-09-29 01:19:28     period_2019-09-29_01:10:00

Its very imported to name the period that way so I know to which day and interval the entry contains.以这种方式命名期间非常重要,因此我知道该条目包含的日期和时间间隔。 How can I do this?我怎样才能做到这一点? And for exmaple if it would have been a 5 minute interval the period should be named like period_2018-09-29_00:00:00 , period_2018-09-29_00:05:00 and period_2018-09-29_00:25:00 and so on and so on.例如,如果间隔为 5 分钟,则周期应命名为period_2018-09-29_00:00:00period_2018-09-29_00:05:00period_2018-09-29_00:25:00等等等。

Thank you very much!非常感谢!

Are you looking for something like this, for minute intervals:您是否正在寻找这样的东西,以分钟为间隔:

df.groupby(['point',df.timestamp_local.dt.floor('5Min')]).size()

and this, for month/year这个,对于月/年

df.groupby(['point', df.timestamp_local.dt.to_period('M')]).size()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM