[英]Pandas: add column based on groupby with condition
我有一個包含四列的數據框:id1、id2、age、stime。 例如
df = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16')],
[2, 1, 10, pd.to_datetime('2020-01-27 00:20:20')],
[3, 1, 60, pd.to_datetime('2020-01-26 00:10:08')],
[4, 2, 1, pd.to_datetime('2020-01-13 00:20:19')],
[5, 2, 2, pd.to_datetime('2020-01-12 00:40:17')],
[6, 2, 3, pd.to_datetime('2020-01-10 00:10:53')],
[7, 3, 20, pd.to_datetime('2020-01-21 00:20:57')],
[8, 3, 40, pd.to_datetime('2020-01-20 00:10:38')],
[9, 3, 60, pd.to_datetime('2020-01-01 00:30:38')],
]),
columns=['id1', 'id2', 'age', 'stime'])
我想添加一個列,其中值是年齡的最大值,它也有一個匹配的 id2 並且在該行的 stime 的最后 2 周內。 所以對於上面的例子,我想得到
df2 = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16'), 3],
[2, 1, 10, pd.to_datetime('2020-01-27 00:20:20'), 60],
[3, 1, 60, pd.to_datetime('2020-01-26 00:10:08'), 60],
[4, 2, 1, pd.to_datetime('2020-01-13 00:20:19'), 3],
[5, 2, 2, pd.to_datetime('2020-01-12 00:40:17'), 3],
[6, 2, 3, pd.to_datetime('2020-01-10 00:10:53'), 3],
[7, 3, 20, pd.to_datetime('2020-01-21 00:20:57'), 40],
[8, 3, 40, pd.to_datetime('2020-01-20 00:10:38'), 40],
[9, 3, 60, pd.to_datetime('2020-01-01 00:30:38'), 60]
]),
columns=['id1', 'id2', 'age', 'stime', 'max_age_last_2w'])
由於我想要執行此操作的 df 非常大,因此非常感謝有關如何有效執行此操作的任何幫助 - 提前致謝!
嘗試:
df['max_age_last_2w'] = df.groupby(['id2', pd.Grouper(key='stime', freq='2W', closed='right')])['age'].transform('max')
輸出:
id1 id2 age stime max_age_last_2w
0 1 1 3 2020-01-10 00:30:16 3
1 2 1 10 2020-01-27 00:20:20 60
2 3 1 60 2020-01-26 00:10:08 60
3 4 2 1 2020-01-13 00:20:19 3
4 5 2 2 2020-01-12 00:40:17 3
5 6 2 3 2020-01-10 00:10:53 3
6 7 3 20 2020-01-21 00:20:57 40
7 8 3 40 2020-01-20 00:10:38 40
8 9 3 60 2020-01-01 00:30:38 60
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.