[英]Remove duplicates from a dataframe based on condition?
I have a df
with columns - name
, cost
and status
.我有一个带有列
name
、 cost
和status
的df
。
name cost status
0 alex 5 pass
1 alex 6 pass
2 alex 7 pass
3 marcus 23 pass
4 marcus 78 fail
5 anthony 1 pass
6 paul 89 pass
7 paul 23 pass
8 paul 10 fail
9 paul 8 pass
if one of the name
column record has status = fail
.如果
name
列记录之一具有status = fail
。 I'm trying to removing whole user's record.我正在尝试删除整个用户的记录。
name cost status
0 alex 5 pass
1 alex 6 pass
2 alex 7 pass
3 anthony 1 pass
Use Series.ne
for compare if not equal value fail
with GroupBy.transform
for test if all True
s per groups by GroupBy.all
and filter by boolean indexing
:使用
Series.ne
因为如果不相等的值进行比较fail
与GroupBy.transform
测试,如果所有的True
每组S按GroupBy.all
和过滤器boolean indexing
:
df = df[df['status'].ne('fail').groupby(df['name']).transform('all')]
print (df)
name cost status
0 alex 5 pass
1 alex 6 pass
2 alex 7 pass
5 anthony 1 pass
Or get all names where status
equal fail
and filter by Series.isin
with ~
for inverse mask for all names with no such names:或者获取
status
相同fail
所有名称,并通过Series.isin
使用~
过滤所有没有此类名称的名称的反向掩码:
df = df[~df['name'].isin(df.loc[df['status'].eq('fail'), 'name'])]
print (df)
name cost status
0 alex 5 pass
1 alex 6 pass
2 alex 7 pass
5 anthony 1 pass
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.