简体   繁体   中英

Pandas Dataframe select row based on a condition and the previous N rows that are previous the condition

I have a dataframe and I want to select the rows based on a condition and the previous N rows that are previous the condition.

Example:

pd.DataFrame({'value':[10,20,30,40,50,60,70,80,90],'is_fishing':['NO','NO','YES','NO','YES','NO','NO','NO','YES']})

     value     is_fishing
0     10         NO
1     20         NO
2     30        YES
3     40         NO
4     50        YES
5     60         NO
6     70         NO
7     80         NO
8     90        YES

Expected with N=1 and condition is_fishing=='YES'

     value     is_fishing
1     20         NO
2     30        YES
3     40         NO
4     50        YES
7     80         NO
8     90        YES

Numpy's split

def n_prior_to_condition(df, n, condition):
    i = np.flatnonzero(condition) + 1
    return pd.concat([d.tail(n+1) for d in np.split(df, i)])

n_prior_to_condition(df, 1, df.is_fishing=="YES")

   value is_fishing
1     20         NO
2     30        YES
3     40         NO
4     50        YES
7     80         NO
8     90        YES

groupby

def n_prior_to_condition(df, n, condition):
    groups = condition.iloc[::-1].cumsum()
    return df.groupby(groups).tail(n+1)

n_prior_to_condition(df, 1, df.is_fishing=="YES")

   value is_fishing
1     20         NO
2     30        YES
3     40         NO
4     50        YES
7     80         NO
8     90        YES
​

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM