[英]pandas DataFrame.groupby and apply custom function
I have a DataFrame with many duplicates (I need Type/StrikePrice pair to be unique) like this: 我有一个包含许多重复项的DataFrame(我需要Type / StrikePrice对是唯一的),如下所示:
Pos AskPrice
Type StrikePrice
C 1500.0 10 281.6
C 1500.0 11 281.9
C 1500.0 12 281.7 <- I need this one
P 1400.0 30 1200.5
P 1400.0 31 1250.2 <- I need this one
How can I group by Type + StrikePrice
and apply some logic (my own function) to decide which row from the group to choose (let's say by the most greater Pos
) 我如何按
Type + StrikePrice
并应用一些逻辑(我自己的函数)来决定从该组中选择哪一行(让我们说最大的Pos
)
The expected result is 预期的结果是
Pos AskPrice
Type StrikePrice
C 1500.0 12 281.7
P 1400.0 31 1250.2
Thanks a lot! 非常感谢!
First reset_index
for unique indices, then groupby
with idxmax
for indices of max values per groups and select rows by loc
, last set_index
for MultiIndex
: 首先是
reset_index
用于唯一索引,然后groupby
用idxmax
表示每个组的最大值索引,并按loc
选择行,最后一个set_index
用于MultiIndex
:
df = df.reset_index()
df = df.loc[df.groupby(['Type','StrikePrice'])['Pos'].idxmax()]
.set_index(['Type','StrikePrice'])
Or use sort_values
with drop_duplicates
: 或者使用
sort_values
与drop_duplicates
:
df = (df.reset_index()
.sort_values(['Type','StrikePrice', 'Pos'])
.drop_duplicates(['Type','StrikePrice'], keep='last')
.set_index(['Type','StrikePrice']))
print (df)
Pos AskPrice
Type StrikePrice
C 1500.0 12 281.7
P 1400.0 31 1250.2
But if need custom function use GroupBy.apply
: 但如果需要自定义函数使用
GroupBy.apply
:
def f(x):
return x[x['Pos'] == x['Pos'].max()]
df = df.groupby(level=[0,1], group_keys=False).apply(f)
print (df)
Pos AskPrice
Type StrikePrice
C 1500.0 12 281.7
P 1400.0 31 1250.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.