根据条件Pandas重置累积和

Question

I have a data frame like: 我有一个数据框，如：

customer spend hurdle 
A         20    50      
A         31    50      
A         20    50      
B         50    100     
B         51    100    
B         30    100

I want to calculate additional column for Cumulative which will reset base on the same customer when the Cumulative sum greater or equal to the hurdle like following : 我想计算Cumulative的附加列，当累积总和大于或等于障碍时，将根据同一客户重置，如下所示：

customer spend hurdle Cumulative 
A         20    50      20
A         31    50      51
A         20    50      20
B         50    100     50
B         51    100    101
B         30    100     30

I used the cumsum and groupby in pandas to but I do not know how to reset it base on the condition. 我在pandas中使用了cumsum和groupby ，但我不知道如何根据条件重置它。

Following are the code I am currently using: 以下是我目前使用的代码：

df1['cum_sum'] = df1.groupby(['customer'])['spend'].apply(lambda x: x.cumsum())

which I know it is just a normal cumulative sum. 我知道这只是一个正常的累积总和。 I very appreciate for your help. 我非常感谢你的帮助。

Answer 1

There could be faster, efficient way. 可以有更快，更有效的方式。 Here's one inefficient apply way to do would be. 这是一种效率低下的apply方式。

In [3270]: def custcum(x):
      ...:     total = 0
      ...:     for i, v in x.iterrows():
      ...:         total += v.spend
      ...:         x.loc[i, 'cum'] = total
      ...:         if total >= v.hurdle:
      ...:            total = 0
      ...:     return x
      ...:

In [3271]: df.groupby('customer').apply(custcum)
Out[3271]:
  customer  spend  hurdle    cum
0        A     20      50   20.0
1        A     31      50   51.0
2        A     20      50   20.0
3        B     50     100   50.0
4        B     51     100  101.0
5        B     30     100   30.0

You may consider using cython or numba to speed up the custcum 您可以考虑使用cython或numba来加速custcum

[Update] [更新]

Improved version of Ido s answer. 改进了Ido的答案版本。

In [3276]: s = df.groupby('customer').spend.cumsum()

In [3277]: np.where(s > df.hurdle.shift(-1), s, df.spend)
Out[3277]: array([ 20,  51,  20,  50, 101,  30], dtype=int64)

Answer 2

One way would be the below code. 一种方法是下面的代码。 But it's a really inefficient and inelegant one-liner. 但这是一个非常低效和不优雅的单行。

df1.groupby('customer').apply(lambda x: (x['spend'].cumsum() *(x['spend'].cumsum() > x['hurdle']).astype(int).shift(-1)).fillna(x['spend']))

根据条件Pandas重置累积和

问题描述

2 个解决方案

解决方案1
3 2017-10-17 07:25:57

解决方案2
1 2017-10-17 07:19:44

根据条件Pandas重置累积和

问题描述

2 个解决方案

解决方案1 3 2017-10-17 07:25:57

解决方案2 1 2017-10-17 07:19:44

解决方案1
3 2017-10-17 07:25:57

解决方案2
1 2017-10-17 07:19:44