Removing rows from a dataframe based on condition or value

Question

Is there a way I can remove data from a df that has been grouped and sorted based on column values?

    id               time_stamp          df  rank
   002         2019-02-23 20:01:13.362  mdf   0
   002         2019-02-23 20:02:06.939  tof   1
   004         2019-03-01 02:30:33.332  mdf   0
   004         2019-03-01 02:34:21.134  tof   1

the data has been grouped by id column and sorted by ascending timestamp. I want to remove all rows or ids that do not have mdf as the value for rank 0, but not just that row, all other rows that are apart of that id as well.

For ex if 004 was not mdf for rank 0 I want to remove all 004s if that makes sense.

Thanks for looking!

Answer 1

You could use boolean masking:

mask = df['df'].ne('mdf') & df['rank'].eq(0)
excl_id = df.loc[mask, 'id'].unique()

df[~df['id'].isin(excl_id)]

Answer 2

Here my solution:

    data="""
id,time_stamp,df,rank
002,2019-02-23 20:01:13.362,mdf,0
002,2019-02-23 20:02:06.939,tof,1
004,2019-03-01 02:30:33.332,mdf,0
004,2019-03-01 02:34:21.134,tof,1
005,2019-03-01 02:35:21.134,mdf,1
005,2019-03-01 02:35:24.134,tof,1
   """
df = pd.read_csv(pd.compat.StringIO(data), sep=',')
print(df)

def process(x):   # the id 005 have to be deleted
    f = x[(x['df']=='mdf')& (x['rank'] == 0)]
    return not f.empty

df = df.groupby('id').filter(lambda x: process(x)).reset_index(drop=True)
print(df)

output:

   id               time_stamp   df  rank
0   2  2019-02-23 20:01:13.362  mdf     0
1   2  2019-02-23 20:02:06.939  tof     1
2   4  2019-03-01 02:30:33.332  mdf     0
3   4  2019-03-01 02:34:21.134  tof     1

Removing rows from a dataframe based on condition or value

Question

2 answers

solution1
2 2019-03-05 16:20:36

solution2
0 2019-03-05 16:56:09

Removing rows from a dataframe based on condition or value

Question

2 answers

solution1 2 2019-03-05 16:20:36

solution2 0 2019-03-05 16:56:09

solution1
2 2019-03-05 16:20:36

solution2
0 2019-03-05 16:56:09