filtering my data based on three columns pandas

Question

all,

i am confused on how to do this.

say i have the below table ( i have provided a snippet of just 1 id but i have many ids)

      *id*         *status*                     *year*               
        2           active                         2018               
        2           active                         2019                  
        2           dissolved                      2019                
        2           dissolved                      2020 
        3           active                         2018               
        3           dissolved                      2019                  
        3           active                         2019                
        3           dissolved                      2020

i would like to filter it such that if id and year are the same take the row where status = to dissolved giving:

      *id*         *status*                     *year*               
        2           active                         2018                                
        2           dissolved                      2019                
        2           dissolved                      2020 
        3           active                         2018                               
        3           dissolved                      2019                
        3           dissolved                      2020

i have tried:

 df.sort_values(['id','year']).drop_duplicates(subset=['id', 'year'],keep='last')

but sometimes a company goes from dissolved to active again and therefore i get the active status when really i would like the dissolved status if in same year for that client. That is why i would like to detect if status are different and if so keep the dissolved one. IE where there is keep 'last' how can i essentially do keep 'dissolved' status.

how can i achieve this?

Answer 1

import pandas as pd
x = pd.DataFrame([(1,"active",'1994'),(1,"dissolved",'1994'),(1,"active",'1995'),(1,"dissolved",'1996'),(2,"active",'1996')],columns=('id','status','year'))
y=pd.DataFrame(columns =x.columns)

#it will remove all the dublicates
for a,b in x.groupby(["id","year"]):
    if(b["id"].count()>1):
        y =y.append(b[b["status"] =="a"],ignore_index =True)
    else:
        y=y.append(b,ignore_index =True)

#now you can do sorting
y.sort_values(["id","year"])

Answer 2

From what i understand u want all rows with same id and year and status == dissolved. Try this:

df[(df.id == df.year) & (df.status == 'dissolved')]

filtering my data based on three columns pandas

Question

2 answers

solution1
1 2020-08-07 19:40:23

solution2
0 2020-08-07 17:11:28

filtering my data based on three columns pandas

Question

2 answers

solution1 1 2020-08-07 19:40:23

solution2 0 2020-08-07 17:11:28

solution1
1 2020-08-07 19:40:23

solution2
0 2020-08-07 17:11:28