[英]Pandas group by 2 columns, apply function, select max value and return index values
Here is the operation I am trying to do: 这是我要执行的操作:
ID SUB_ID AMOUNT
1 101 1 50
2 101 1 -10
3 101 1 -20
4 101 2 30
5 101 2 20
6 102 3 10
7 102 3 -10
8 102 4 10
9 102 4 10
We want to group by ID
and SUB_ID
, and then take the sum of the absolute value of AMOUNT
. 我们SUB_ID
ID
和SUB_ID
,然后取AMOUNT
的绝对值之和。 Then order this summed up column within ID
groups and return the SUB_ID
values of the maximum value. 然后,对ID
组中的汇总列进行排序,并返回最大值的SUB_ID
值。
We can get the summation by: 我们可以通过以下方式求和:
df1 = (df
.groupby(['ID','SUB_ID'])
.apply(lambda x: np.sum(np.absolute(x['AMOUNT']))))
)
This will return a Series with MultiIndex 这将返回具有MultiIndex的系列
ID SUB_ID
101 1 80
2 50
102 3 20
4 20
From here I would like to return [1,3] ([1,4] is also accepted as the two values in the 102 group are the same, but we want to return only one value per group!) 从这里我想返回[1,3]([1,4]也被接受,因为102组中的两个值相同,但是我们只希望每个组返回一个值!)
Obviously we can loop and pick the max but I am trying to find out the most efficient way possible. 显然,我们可以循环并选择最大值,但我正在尝试找出最有效的方法。 This operation will be applied to millions of rows. 此操作将应用于数百万行。
This is one way. 这是一种方式。 As your dataset is large, I strongly recommend you avoid lambda
functions since these are not applied in a vectorised fashion. 由于您的数据集很大,因此我强烈建议您避免使用lambda
函数,因为它们不会以矢量化方式应用。
res = df.assign(AMOUNT=df['AMOUNT'].abs())\
.groupby(['ID', 'SUB_ID'], as_index=False).sum()\
.sort_values('AMOUNT', ascending=False)\
.groupby('ID').head(1)
Example 例
df = pd.DataFrame([[101, 1, 50], [101, 1, -10], [101, 1, -20], [101, 2, 30],
[101, 2, 20], [102, 3, 10], [102, 3, -10], [102, 4, 10], [102, 4, 10]],
columns=['ID', 'SUB_ID', 'AMOUNT'])
res = df.assign(AMOUNT=df['AMOUNT'].abs())\
.groupby(['ID', 'SUB_ID'], as_index=False).sum()\
.sort_values('AMOUNT', ascending=False)\
.groupby('ID').head(1)
print(res)
ID SUB_ID AMOUNT
0 101 1 80
2 102 3 20
I think you can use nlargest
: 我认为您可以使用nlargest
:
df1.groupby('ID').nlargest(1).index.get_level_values(level='SUB_ID').tolist()
# [1, 3]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.