I have a hard time to do a complex dataframe filtering.
Here the problem:
For each column 'id' of same value, the column 'job' can take the values 'fireman','nan','policeman'.
I would like to filter my dataframe so that for each id of same value,
I keep only the rows starting where the value 'fireman' for job is occuring the last consecutive time. I first have to group by 'job' values to filter on:
df.groupby("job").filter(lambda x: f(x))
I don't know which function f is appropriate.
Any ideas ?
To try:
df = pd.DataFrame([[79,1,], [79,2,'fireman'],[79,3,'fireman'],[79,4,],[79,5,],[79,6,'fireman'],[79,7,'fireman'],[79,8,'policeman']], columns=['id','day','job'])
output = pd.DataFrame([[79,6,'fireman'],[79,7,'fireman'],[79,8,'policeman']], columns=['id','day','job'])
Here is a version without the need of extra variables:
df.groupby('imo').apply(lambda grp: grp[grp.index >=
((grp.polygon.shift() != grp.polygon) &
(grp.polygon.shift(-1) == grp.polygon) &
(grp.polygon == 'FE')
).cumsum().idxmax()]
).reset_index(level=0, drop=True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.