简体   繁体   中英

Pandas Resampling based on Value exceeding threshold

I have a database with 2 columns.

import pandas as pd
data = pd.DataFrame({'a':[1,2,1,4,1,1,3,1,4,1,1,1],'b':[5,2,8,3,10,3,5,15,45,41,23,9]}) 

    a   b
0   1   5
1   2   2
2   1   8
3   4   3
4   1   10
5   1   3
6   3   5
7   1   15
8   4   45
9   1   41
10  1   23
11  1   9

Is there a pythonic/fastest way to pick out the row indices whenever the cumulative value since the last occurrence exceeds a given threshold for column a? for example, in the above df, if my threshold is like 5, I would get indices 3,6,8.

The way I'm currently doing it is loop through every row and then keep track of when values exceed it. I am not enough of a python expert to come up with a potentially (if it exist) better way..

thanks

Until someone invented some pandas one-liner (if possible), you could try the following approach:

From IPython session:

In [393]: get_a_cumsum_lim = lambda df, col, threshold: df[col][df[col].cumsum() >= threshold]

In [394]: s, result = get_a_cumsum_lim(data, 'a', 5), []

In [395]: while not s.empty:
     ...:     idx = s.index[0]
     ...:     result.append(idx)
     ...:     s = get_a_cumsum_lim(data[idx+1:], 'a', 5)
     ...:     
     ...:     

In [396]: result
Out[396]: [3, 6, 8]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM