简体   繁体   中英

Remove duplicates from a dataframe based on condition?

I have a df with columns - name , cost and status .

            name        cost      status
    0       alex        5          pass
    1       alex        6          pass
    2       alex        7          pass
    3       marcus      23         pass
    4       marcus      78         fail
    5       anthony     1          pass
    6       paul        89         pass
    7       paul        23         pass
    8       paul        10         fail
    9       paul         8         pass

if one of the name column record has status = fail . I'm trying to removing whole user's record.

            name        cost      status
    0       alex        5          pass
    1       alex        6          pass
    2       alex        7          pass
    3       anthony     1          pass

Use Series.ne for compare if not equal value fail with GroupBy.transform for test if all True s per groups by GroupBy.all and filter by boolean indexing :

df = df[df['status'].ne('fail').groupby(df['name']).transform('all')]
print (df)
      name  cost status
0     alex     5   pass
1     alex     6   pass
2     alex     7   pass
5  anthony     1   pass

Or get all names where status equal fail and filter by Series.isin with ~ for inverse mask for all names with no such names:

df = df[~df['name'].isin(df.loc[df['status'].eq('fail'), 'name'])]
print (df)
      name  cost status
0     alex     5   pass
1     alex     6   pass
2     alex     7   pass
5  anthony     1   pass

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM