Pandas：根據條件添加基於 groupby 的列

Question

我有一個包含四列的數據框：id1、id2、age、stime。 例如

df = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16')], 
                         [2, 1, 10, pd.to_datetime('2020-01-27 00:20:20')], 
                         [3, 1, 60, pd.to_datetime('2020-01-26 00:10:08')],
                         [4, 2, 1, pd.to_datetime('2020-01-13 00:20:19')], 
                         [5, 2, 2, pd.to_datetime('2020-01-12 00:40:17')],
                         [6, 2, 3, pd.to_datetime('2020-01-10 00:10:53')], 
                         [7, 3, 20, pd.to_datetime('2020-01-21 00:20:57')],
                         [8, 3, 40, pd.to_datetime('2020-01-20 00:10:38')], 
                         [9, 3, 60, pd.to_datetime('2020-01-01 00:30:38')],
                       ]),
                       columns=['id1', 'id2', 'age', 'stime'])

我想添加一個列，其中值是年齡的最大值，它也有一個匹配的 id2 並且在該行的 stime 的最后 2 周內。 所以對於上面的例子，我想得到

df2 = pd.DataFrame(np.array([[1, 1, 3, pd.to_datetime('2020-01-10 00:30:16'), 3], 
                         [2, 1, 10, pd.to_datetime('2020-01-27 00:20:20'), 60], 
                         [3, 1, 60, pd.to_datetime('2020-01-26 00:10:08'), 60],
                         [4, 2, 1, pd.to_datetime('2020-01-13 00:20:19'), 3], 
                         [5, 2, 2, pd.to_datetime('2020-01-12 00:40:17'), 3],
                         [6, 2, 3, pd.to_datetime('2020-01-10 00:10:53'), 3], 
                         [7, 3, 20, pd.to_datetime('2020-01-21 00:20:57'), 40],
                         [8, 3, 40, pd.to_datetime('2020-01-20 00:10:38'), 40], 
                         [9, 3, 60, pd.to_datetime('2020-01-01 00:30:38'), 60]
                       ]),
                       columns=['id1', 'id2', 'age', 'stime', 'max_age_last_2w'])

由於我想要執行此操作的 df 非常大，因此非常感謝有關如何有效執行此操作的任何幫助 - 提前致謝！

Answer 1

嘗試：

df['max_age_last_2w'] = df.groupby(['id2', pd.Grouper(key='stime', freq='2W', closed='right')])['age'].transform('max')

輸出：

  id1 id2 age               stime  max_age_last_2w
0   1   1   3 2020-01-10 00:30:16                3
1   2   1  10 2020-01-27 00:20:20               60
2   3   1  60 2020-01-26 00:10:08               60
3   4   2   1 2020-01-13 00:20:19                3
4   5   2   2 2020-01-12 00:40:17                3
5   6   2   3 2020-01-10 00:10:53                3
6   7   3  20 2020-01-21 00:20:57               40
7   8   3  40 2020-01-20 00:10:38               40
8   9   3  60 2020-01-01 00:30:38               60

Pandas：根據條件添加基於 groupby 的列

問題描述

1 個解決方案

解決方案1
0 已采納 2020-02-20 07:28:19

Pandas：根據條件添加基於 groupby 的列

問題描述

1 個解決方案

解決方案1 0 已采納 2020-02-20 07:28:19

解決方案1
0 已采納 2020-02-20 07:28:19