[英]Cumulative sum (pandas)
Apologies if this has been asked already. 抱歉,是否已经有人问过。
I am trying to create a yearly cumulative sum for all order-points within a certain customer account, and am struggling. 我正在尝试为某个客户帐户中的所有订购点创建年度累积总和,并且很挣扎。
Essentially, I want to create `YearlyTotal' below: 本质上,我想在下面创建`YearlyTotal':
Customer Year Date Order PointsPerOrder YearlyTotal
123456 2016 11/2/16 A939 1 20
123456 2016 3/13/16 A102 19 19
789089 2016 7/15/16 A123 7 7
I've tried: 我试过了:
df['YEARLYTOTAL'] = df.groupby(by=['Customer','Year'])['PointsPerOrder'].cumsum()
But this produces YearlyTotal
in the wrong order (ie, YearlyTotal
of A939
is 1 instead of 20. 但这会以错误的顺序生成
YearlyTotal
(即YearlyTotal
的A939
为1而不是20)。
Not sure if this matters, but Customer
is a string (the database has leading zeroes -- don't get me started). 不知道这是否重要,但是
Customer
是一个字符串(数据库的前导零–不要让我入门)。 sort_values(by=['Customer','Year','Date'],ascending=True)
at the front also produces an error. sort_values(by=['Customer','Year','Date'],ascending=True)
也会产生错误。
Help? 救命?
Use [::-1]
for reversing dataframe: 使用
[::-1]
反转数据帧:
df['YEARLYTOTAL'] = df[::-1].groupby(by=['Customer','Year'])['PointsPerOrder'].cumsum()
print (df)
Customer Year Date Order PointsPerOrder YearlyTotal YEARLYTOTAL
0 123456 2016 11/2/16 A939 1 20 20
1 123456 2016 3/13/16 A102 19 19 19
2 789089 2016 7/15/16 A123 7 7 7
first make sure Date
is a datetime
column: 首先确保
Date
是datetime
列:
In [35]: df.Date = pd.to_datetime(df.Date)
now we can do: 现在我们可以做:
In [36]: df['YearlyTotal'] = df.sort_values('Date').groupby(['Customer','Year'])['PointsPerOrder'].cumsum()
In [37]: df
Out[37]:
Customer Year Date Order PointsPerOrder YearlyTotal
0 123456 2016 2016-11-02 A939 1 20
1 123456 2016 2016-03-13 A102 19 19
2 789089 2016 2016-07-15 A123 7 7
PS this solution will NOT depend on the order of records... PS此解决方案将不依赖于记录的顺序...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.