简体   繁体   English

如何使用 pandas groupby, grouper 和 ngroup?

[英]How to use pandas groupby, grouper with ngroup?

I have a dataframe with a date time column.我有一个带有日期时间列的 dataframe。 I am trying to group based on 24 hour window but I am not sure what I'm missing.我正在尝试根据 24 小时 window 进行分组,但我不确定我缺少什么。 Please let me know where am i going wrong.请让我知道我哪里出错了。

For example my dataframe is as below例如我的 dataframe 如下

                Dates
0 2021-07-26 07:30:00
1 2021-07-26 13:05:00
2 2021-07-28 08:00:00
3 2021-07-29 00:36:00
4 2021-07-29 16:15:00

I am trying to group the dataframe and give them a unique number if it falls under a 24 hours window based on the first date in each group.我正在尝试对 dataframe 进行分组,如果它根据每个组中的第一个日期在 24 小时 window 内,则给他们一个唯一的编号。

Meaning it should group and assign a unique number as below.这意味着它应该分组并分配一个唯一的编号,如下所示。 It picks the first value and groups all the row values coming after it where the time falls within the 24 hours window. So in this example it should group everything which fall between (2021-07-26 07:30:00 to 2021-07-27 07:30:00) as 1 and (2021-07-28 08:00:00 to 2021-07-29 08:00:00) as 2 and (2021-07-29 16:15:00 to 2021-07-30 16:15:00) as 3它选择第一个值并将其后的所有行值分组,其中时间在 24 小时 window 内。因此,在此示例中,它应该对介于(2021-07-26 07:30:00 到 2021-07)之间的所有内容进行分组-27 07:30:00) 作为 1 和 (2021-07-28 08:00:00 至 2021-07-29 08:00:00) 作为 2 和 (2021-07-29 16:15:00 至 2021) -07-30 16:15:00) 作为 3

Expected O/P预期产量

                 date  groupedbytime
0 2021-07-26 07:30:00   1
1 2021-07-26 13:05:00   1
2 2021-07-28 08:00:00   2
3 2021-07-29 00:36:00   2
4 2021-07-29 16:15:00   3

I am using groupby and grouper but I am getting the o/p as below where its grouping as per the days but not as per the 24 hour window. Kindly advise how to approach this我正在使用 groupby 和 grouper,但我得到的 o/p 如下所示,它按天分组,但不是按 24 小时分组 window。请告知如何处理此问题

tempdf['groupedbytime'] = tempdf.groupby(pd.Grouper(key="Dates",freq='24H')).ngroup()+1

O/P输出/输出

                 date  groupedbytime
0 2021-07-26 07:30:00   1
1 2021-07-26 13:05:00   1
2 2021-07-28 08:00:00   2
3 2021-07-29 00:36:00   3
4 2021-07-29 16:15:00   3

You can working with timedeltas created by subtract first value with integer division, for consecutive order is added factorize :您可以使用通过用 integer 除法减去第一个值创建的 timedeltas,为连续顺序添加factorize

s = df['Dates'].sub(df['Dates'].iat[0]).dt.total_seconds() // (3600 * 24)
df['groupedbytime'] = pd.factorize(s)[0] + 1
print (df)
                Dates  groupedbytime
0 2021-07-26 07:30:00              1
1 2021-07-26 13:05:00              1
2 2021-07-28 08:00:00              2
3 2021-07-29 00:36:00              2
4 2021-07-29 16:15:00              3

With Grouper :Grouper

s = df['Dates'].sub(df['Dates'].iat[0])
s = s.to_frame().groupby(pd.Grouper(key="Dates",freq='24H'))['Dates'].ngroup()
df['groupedbytime'] = pd.factorize(s)[0] + 1
print (df)
0 2021-07-26 07:30:00              1
1 2021-07-26 13:05:00              1
2 2021-07-28 08:00:00              2
3 2021-07-29 00:36:00              2
4 2021-07-29 16:15:00              3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM