简体   繁体   中英

Python filter repeated rows with condition

I have a table that looks like this

Date           Col0          Col1            Col2   
2-18-2019       1            ap sd            23
2-18-2019       2            dh au            88
2-18-2019       3            ap hre           92
2-19-2019       1            sd ap            23
2-19-2019       2            sd ap            78
2-19-2019       3            ap sd            78
2-20-2019       1            ap sd            37
2-20-2019       2            sd ap            29
2-20-2019       3            djd dh           34
2-21-2019       1            eds ed           44
2-21-2019       2            u4r rg           34
2-21-2019       3            ufif ew          23
2-22-2019       1            eds sd           44
2-22-2019       2            u4r rg           34
2-22-2019       3            ap ew            23

I need to filter last row with the key words if they were repeated for several days, so If few days later the key words were repeated i need to include them just like the result table below.

the result I'm looking for should be something like this

Date           Col0          Col1            Col2   
2-19-2019       3            ap sd            78
2-20-2019       1            ap sd            37
2-20-2019       2            sd ap            29
2-22-2019       1            eds sd           44
2-22-2019       3            ap ew            23

I tried this

df = df[(Col1.str.contains('ap')) | (Col1.str.contains('sd'))]

but this would give me this result

Date           Col0          Col1            Col2   
2-18-2019       1            ap sd            23
2-19-2019       1            sd ap            23
2-19-2019       2            sd ap            78
2-19-2019       3            ap sd            78
2-20-2019       1            ap sd            37
2-20-2019       2            sd ap            29
2-22-2019       1            eds sd           44
2-22-2019       3            ap ew            23

And this is wrong since it return everything. the difference between the result I have and the desired one below is that if the condition was not met in one day (date column) or more then it shows again I need to repeat the process

Date           Col0          Col1            Col2   
2-19-2019       3            ap sd            78
2-20-2019       1            ap sd            37
2-20-2019       2            sd ap            29
2-22-2019       1            eds sd           44
2-22-2019       3            ap ew            23

Thanks

IIUC use:

df = df[df.Col1.str.contains('ap|sd')].drop_duplicates('Col1', keep='last')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM