[英]How to materialize the group key for each row of the original dataframe? ('by' is a pandas grouper)
I would like to materialize for each row of a dataframe the corresponding group key it would get if I was using a groupby
operation with a pandas Grouper
.如果我将
groupby
操作与 pandas Grouper
一起使用,我想为 dataframe 的每一行实现相应的组密钥。
import pandas as pd
# Test data
ts = [pd.Timestamp('2022/03/01 09:00'),
pd.Timestamp('2022/03/01 10:00'),
pd.Timestamp('2022/03/01 10:30'),
pd.Timestamp('2022/03/01 15:00')]
df = pd.DataFrame({'a':range(len(ts)), 'ts': ts})
grouper = pd.Grouper(key='ts', freq='2H', sort=False, origin='start_day')
Is there any way to get for each row the corresponding groupkey?有没有办法为每一行获取相应的组键? The result I am looking for could be either a list, or a pandas Series or Index, or numpy array, the same length as the initial dataframe, and would then contain following values.
我正在寻找的结果可能是一个列表,或者一个 pandas 系列或索引,或者 numpy 数组,与初始 dataframe 的长度相同,然后将包含以下值。
result = pd.Series([pd.Timestamp('2022-03-01 08:00:00'),
pd.Timestamp('2022-03-01 10:00:00'),
pd.Timestamp('2022-03-01 10:00:00'),
pd.Timestamp('2022-03-01 14:00:00')])
Thanks for your help!谢谢你的帮助! Bests
最好的
Not directly using the groupby
but you can use:不直接使用
groupby
但您可以使用:
df['ts'].dt.floor('2H')
With the groupby
:使用
groupby
:
df.groupby(grouper)['ts'].transform(lambda g: g.name)
Output: Output:
0 2022-03-01 08:00:00
1 2022-03-01 10:00:00
2 2022-03-01 10:00:00
3 2022-03-01 14:00:00
Name: ts, dtype: datetime64[ns]
Given:鉴于:
a ts
0 0 2022-03-01 09:00:00
1 1 2022-03-01 10:00:00
2 2 2022-03-01 10:30:00
3 3 2022-03-01 15:00:00
Doing:正在做:
pd.Series(df.resample('2H', origin='start_day', on='ts').groups)
Output: Output:
2022-03-01 08:00:00 1
2022-03-01 10:00:00 3
2022-03-01 12:00:00 3
2022-03-01 14:00:00 4
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.