I have a database with 2 columns.
import pandas as pd
data = pd.DataFrame({'a':[1,2,1,4,1,1,3,1,4,1,1,1],'b':[5,2,8,3,10,3,5,15,45,41,23,9]})
a b
0 1 5
1 2 2
2 1 8
3 4 3
4 1 10
5 1 3
6 3 5
7 1 15
8 4 45
9 1 41
10 1 23
11 1 9
Is there a pythonic/fastest way to pick out the row indices whenever the cumulative value since the last occurrence exceeds a given threshold for column a? for example, in the above df, if my threshold is like 5, I would get indices 3,6,8.
The way I'm currently doing it is loop through every row and then keep track of when values exceed it. I am not enough of a python expert to come up with a potentially (if it exist) better way..
thanks
Until someone invented some pandas
one-liner (if possible), you could try the following approach:
From IPython session:
In [393]: get_a_cumsum_lim = lambda df, col, threshold: df[col][df[col].cumsum() >= threshold]
In [394]: s, result = get_a_cumsum_lim(data, 'a', 5), []
In [395]: while not s.empty:
...: idx = s.index[0]
...: result.append(idx)
...: s = get_a_cumsum_lim(data[idx+1:], 'a', 5)
...:
...:
In [396]: result
Out[396]: [3, 6, 8]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.