简体   繁体   中英

In pandas dataframe - returning last value of cumulative sum that satisfies condition

index [0, 1, 2, 3, 4, 5]
part_1 [4, 5, 6, 4, 8, 4]
part_2 [11, 12, 10, 12, 14, 13]
new [6, 4, 8, 8, na, na]

I'm a beginner in python & pandas asking for support. In a simple dataframe, I want to create a new column that gives me the last row of a cumulative sum that satisfies the condition

df.part_1.cumsum() > df.part_2

So eg for the new column at index 0 I would get the value 6 as (4+5+6) > 11.

Thanks!

IIUC here a NumPy based approach. The idea is to build an upper triangular matrix, with shifted versions of the input array in each row. By taking the cumulative sum of these, and comparing against the second column of the dataframe, we can find using argmax the first index where a value in the cumulative sequences is greater than the third dataframe column in the corresponding index:

a = df.to_numpy()
cs = np.triu(a[:,1]).cumsum(1)
ix = (cs >= a[:,2,None]).argmax(1)
# array([2, 3, 3, 4, 6, 7, 7, 0], dtype=int64)
df['first_ix'] = a[ix,1,None]

print(df)
   index  part_1  part_2  first_ix
0      0       4      11         6
1      1       5      12         4
2      2       6      10         4
3      3       4      12         8
4      4       8      14         6
5      5       4      13         8
6      6       6      11         8
7      7       8      10         4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM