[英]Reset Cumulative sum base on condition Pandas
I have a data frame like: 我有一个数据框,如:
customer spend hurdle
A 20 50
A 31 50
A 20 50
B 50 100
B 51 100
B 30 100
I want to calculate additional column for Cumulative which will reset base on the same customer when the Cumulative sum greater or equal to the hurdle like following : 我想计算Cumulative的附加列,当累积总和大于或等于障碍时,将根据同一客户重置,如下所示:
customer spend hurdle Cumulative
A 20 50 20
A 31 50 51
A 20 50 20
B 50 100 50
B 51 100 101
B 30 100 30
I used the cumsum
and groupby
in pandas to but I do not know how to reset it base on the condition. 我在pandas中使用了
cumsum
和groupby
,但我不知道如何根据条件重置它。
Following are the code I am currently using: 以下是我目前使用的代码:
df1['cum_sum'] = df1.groupby(['customer'])['spend'].apply(lambda x: x.cumsum())
which I know it is just a normal cumulative sum. 我知道这只是一个正常的累积总和。 I very appreciate for your help.
我非常感谢你的帮助。
There could be faster, efficient way. 可以有更快,更有效的方式。 Here's one inefficient
apply
way to do would be. 这是一种效率低下的
apply
方式。
In [3270]: def custcum(x):
...: total = 0
...: for i, v in x.iterrows():
...: total += v.spend
...: x.loc[i, 'cum'] = total
...: if total >= v.hurdle:
...: total = 0
...: return x
...:
In [3271]: df.groupby('customer').apply(custcum)
Out[3271]:
customer spend hurdle cum
0 A 20 50 20.0
1 A 31 50 51.0
2 A 20 50 20.0
3 B 50 100 50.0
4 B 51 100 101.0
5 B 30 100 30.0
You may consider using cython
or numba
to speed up the custcum
您可以考虑使用
cython
或numba
来加速custcum
[Update] [更新]
Improved version of Ido s answer. 改进了Ido的答案版本。
In [3276]: s = df.groupby('customer').spend.cumsum()
In [3277]: np.where(s > df.hurdle.shift(-1), s, df.spend)
Out[3277]: array([ 20, 51, 20, 50, 101, 30], dtype=int64)
One way would be the below code. 一种方法是下面的代码。 But it's a really inefficient and inelegant one-liner.
但这是一个非常低效和不优雅的单行。
df1.groupby('customer').apply(lambda x: (x['spend'].cumsum() *(x['spend'].cumsum() > x['hurdle']).astype(int).shift(-1)).fillna(x['spend']))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.