Filter pandas dataframe based on opposite condition whether True/False in a column

Question

I want to delete the duplicates rows in a pandas dataframe from the below dataframe on the column "msgid" and keep the values satisfying below conditions:

Start by evaluating "tr_flag":

if mix of True and False, then keep True
if all False, then keep min(evid)
if more than one true then keep max(evid).

I tried the approach of using sql: by using Case statement and partition by msgid. But not able to get all three scenarios able to get first and second only. Is sql ok or any other better approach?

dataset:

         Date plid  evid msgid tr_type  tr_flag
0  08-11-2021  pl1   111  msg1     new    False
1  08-11-2021  pl1   222  msg1     new    False
2  08-11-2021  pl1   333  msg1     new    False
3  08-11-2021  pl1   444  msg2     new    False
4  08-11-2021  pl1   555  msg2     new     True
5  08-11-2021  pl1   666  msg2     new    False
6  08-11-2021  pl1   777  msg3     new     True
7  08-11-2021  pl1   888  msg3     new     True
8  08-11-2021  pl1   999  msg3     new     True

Answer 1

You can assign a custom sorting key (here negative 'tr_flag' for True, positive for False), sort on the key, groupby 'msgid` and keep first row:

(df.assign(key=df['tr_flag'].eq(False).mul(2).sub(1).mul(df['evid']))
   .sort_values(by='key')
   .groupby('msgid').first()
   .drop('key', axis=1)
)

output:

             Date plid  evid tr_type  tr_flag
msgid                                        
msg1   08-11-2021  pl1   111     new    False
msg2   08-11-2021  pl1   555     new     True
msg3   08-11-2021  pl1   999     new     True

Filter pandas dataframe based on opposite condition whether True/False in a column

Question

1 answers

solution1
1 ACCPTED 2021-08-20 08:11:15

Filter pandas dataframe based on opposite condition whether True/False in a column

Question

1 answers

solution1 1 ACCPTED 2021-08-20 08:11:15

solution1
1 ACCPTED 2021-08-20 08:11:15