简体   繁体   English

根据条件从数据框中删除重复项?

[英]Remove duplicates from a dataframe based on condition?

I have a df with columns - name , cost and status .我有一个带有列namecoststatusdf

            name        cost      status
    0       alex        5          pass
    1       alex        6          pass
    2       alex        7          pass
    3       marcus      23         pass
    4       marcus      78         fail
    5       anthony     1          pass
    6       paul        89         pass
    7       paul        23         pass
    8       paul        10         fail
    9       paul         8         pass

if one of the name column record has status = fail .如果name列记录之一具有status = fail I'm trying to removing whole user's record.我正在尝试删除整个用户的记录。

            name        cost      status
    0       alex        5          pass
    1       alex        6          pass
    2       alex        7          pass
    3       anthony     1          pass

Use Series.ne for compare if not equal value fail with GroupBy.transform for test if all True s per groups by GroupBy.all and filter by boolean indexing :使用Series.ne因为如果不相等的值进行比较failGroupBy.transform测试,如果所有的True每组S按GroupBy.all和过滤器boolean indexing

df = df[df['status'].ne('fail').groupby(df['name']).transform('all')]
print (df)
      name  cost status
0     alex     5   pass
1     alex     6   pass
2     alex     7   pass
5  anthony     1   pass

Or get all names where status equal fail and filter by Series.isin with ~ for inverse mask for all names with no such names:或者获取status相同fail所有名称,并通过Series.isin使用~过滤所有没有此类名称的名称的反向掩码:

df = df[~df['name'].isin(df.loc[df['status'].eq('fail'), 'name'])]
print (df)
      name  cost status
0     alex     5   pass
1     alex     6   pass
2     alex     7   pass
5  anthony     1   pass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM