简体   繁体   中英

complex dataframe filtering request on the last occurence of a value in Panda/Python [EDIT]

I have a hard time to do a complex dataframe filtering.

Here the problem:

For each column 'id' of same value, the column 'job' can take the values 'fireman','nan','policeman'.

I would like to filter my dataframe so that for each id of same value,

I keep only the rows starting where the value 'fireman' for job is occuring the last consecutive time. I first have to group by 'job' values to filter on:

 df.groupby("job").filter(lambda x: f(x))

I don't know which function f is appropriate.

Any ideas ?

To try:

df = pd.DataFrame([[79,1,], [79,2,'fireman'],[79,3,'fireman'],[79,4,],[79,5,],[79,6,'fireman'],[79,7,'fireman'],[79,8,'policeman']], columns=['id','day','job'])


output = pd.DataFrame([[79,6,'fireman'],[79,7,'fireman'],[79,8,'policeman']], columns=['id','day','job'])

Here is a version without the need of extra variables:

df.groupby('imo').apply(lambda grp: grp[grp.index >= 
                                        ((grp.polygon.shift() != grp.polygon) & 
                                         (grp.polygon.shift(-1) == grp.polygon) & 
                                         (grp.polygon == 'FE')
                                        ).cumsum().idxmax()]
                       ).reset_index(level=0, drop=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM