![](/img/trans.png)
[英]Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C
[英]Python: Remove duplicates from DataFrame based on another column value
也许这可以做到:
df.sort_values(['Won Turnover', 'Lost Turnover'], ascending=False).drop_duplicates('Supplier')
首先使用GroupBy.all
测试Won Turnover
的每个组的缺失值,并仅测试每个Lost Turnover
的max
。 通过&
链接按位AND
并添加新条件以返回每个Won Turnover
的所有不丢失行与|
对于按位OR
:
m1 = (df.assign(new = df['Won Turnover'].isna())
.groupby(['Date','Supplier','Customer'])['new'].transform('all'))
m2 = (df.groupby(['Date','Supplier','Customer'])['Lost Turnover'].transform('max')
.eq(df['Lost Turnover']))
df = df[(m1 & m2) | df['Won Turnover'].notna()]
print (df)
Date Supplier Customer Won Turnover Lost Turnover
1 25.06.2019 Nike Pepsi 25000.0 NaN
2 25.06.2019 Nike McDonalds 10000.0 NaN
3 25.06.2019 Adidas Coca Cola 12000.0 NaN
5 25.06.2019 Adidas McDonalds 35000.0 NaN
6 25.06.2019 Adidas Pepsi NaN 15000.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.