Remove duplicates from a dataframe based on condition?

Question

I have a df with columns - name , cost and status .

            name        cost      status
    0       alex        5          pass
    1       alex        6          pass
    2       alex        7          pass
    3       marcus      23         pass
    4       marcus      78         fail
    5       anthony     1          pass
    6       paul        89         pass
    7       paul        23         pass
    8       paul        10         fail
    9       paul         8         pass

if one of the name column record has status = fail . I'm trying to removing whole user's record.

            name        cost      status
    0       alex        5          pass
    1       alex        6          pass
    2       alex        7          pass
    3       anthony     1          pass

Answer 1

Use Series.ne for compare if not equal value fail with GroupBy.transform for test if all True s per groups by GroupBy.all and filter by boolean indexing :

df = df[df['status'].ne('fail').groupby(df['name']).transform('all')]
print (df)
      name  cost status
0     alex     5   pass
1     alex     6   pass
2     alex     7   pass
5  anthony     1   pass

Or get all names where status equal fail and filter by Series.isin with ~ for inverse mask for all names with no such names:

df = df[~df['name'].isin(df.loc[df['status'].eq('fail'), 'name'])]
print (df)
      name  cost status
0     alex     5   pass
1     alex     6   pass
2     alex     7   pass
5  anthony     1   pass

Remove duplicates from a dataframe based on condition?

Question

1 answers

solution1
0 ACCPTED 2020-10-06 10:19:35

Remove duplicates from a dataframe based on condition?

Question

1 answers

solution1 0 ACCPTED 2020-10-06 10:19:35

solution1
0 ACCPTED 2020-10-06 10:19:35