[英]Pandas groupby rolling apply list
这是示例代码。
df = pd.DataFrame({'A': ['2020-04-28','2020-04-28','2020-04-29','2020-04-29','2020-04-30','2020-04-30'],
'B': ['11-000-000-246_1064461', '30-000-015-488_1191035','11-000-000-246_1064461','30-000-015-488_1191035','30-000-015-488_1191035','11-000-000-246_1064461'],
'C': [[4700652221, 4700652723],[4700653241], 0, [4700652781, 4700656546],[4700646464, 4700645646],[4700652748, 4700659873, 4700659238]]
})
我的 dataframe 看起来像:
A B C
0 2020-04-28 11-000-000-246_1064461 [4700652221, 4700652723]
1 2020-04-28 30-000-015-488_1191035 [4700653241]
2 2020-04-29 11-000-000-246_1064461 []
3 2020-04-29 30-000-015-488_1191035 [4700652781, 4700656546]
4 2020-04-30 30-000-015-488_1191035 [4700646464, 4700645646]
5 2020-04-30 11-000-000-246_1064461 [4700652748, 4700659873, 4700659238]
我尝试使用此代码在 2 天的滚动 window 中获取一个名为 D 的新列,其中包含所有 C 数组项的数组,但它不起作用:
df = df.groupby(['A','B'])['C'].rolling(2).apply(list).reset_index(name = 'D')
我需要得到这样的东西:
A B D
0 2020-04-28 11-000-000-246_1064461 Nan
1 2020-04-28 30-000-015-488_1191035 Nan
2 2020-04-29 11-000-000-246_1064461 [4700652221, 4700652723]
3 2020-04-29 30-000-015-488_1191035 [4700652781, 4700656546, 4700653241]
4 2020-04-30 30-000-015-488_1191035 [4700646464, 4700645646, 4700652781, 4700656546]
5 2020-04-30 11-000-000-246_1064461 [4700652748, 4700659873, 4700659238]
在B
列上使用DataFrame.groupby
然后在C
.transform
在此转换方法中,使用Series.shift
移动列,然后将列与自身连接:
df['D'] = (
df.groupby('B')['C']
.transform(lambda s: s + s.shift(1))
)
df1 = df.drop('C', 1)
# print(df1)
A B D
0 2020-04-28 11-000-000-246_1064461 NaN
1 2020-04-28 30-000-015-488_1191035 NaN
2 2020-04-29 11-000-000-246_1064461 [4700652221, 4700652723]
3 2020-04-29 30-000-015-488_1191035 [4700652781, 4700656546, 4700653241]
4 2020-04-30 30-000-015-488_1191035 [4700646464, 4700645646, 4700652781, 4700656546]
5 2020-04-30 11-000-000-246_1064461 [4700652748, 4700659873, 4700659238]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.