[英]Python: Cumulative Sum with changing key
I have a table of data such as: 我有一个数据表,例如:
F(1) F(2) F(3) Amount
A B C 100
A B C 100
A B C 100
D E F 300
D E F 150
G H I 100
G H I 200
I would like to produce a new column, showing the cumulative sum of field 'Amount', but one that resets to 0 whenever the key of columns F(1), F(2) and F(3) change. 我想产生一个新列,显示字段“金额”的累积总和,但是只要列F(1),F(2)和F(3)的键更改,该列就会重置为0。
ie I would like to create the following output (sans dotted lines!) 即我想创建以下输出(无虚线!)
F(1) F(2) F(3) Amount CumSum
A B C 100 100
A B C 100 200
A B C 100 300
------------------------------ resets to zero as key changes
D E F 300 300
D E F 150 450
------------------------------ resets to zero as key changes
G H I 100 100
G H I 200 300
I have potentially up to a million rows in this table so I am looking for a robust implementation. 该表中可能有多达一百万行,因此我正在寻找可靠的实现。 Is pandas the way forward here?
熊猫在这里是前进的方向吗? I have not used pandas before but am happy to explore.
我以前没有用过熊猫,但很高兴探索。
group by your key columns and call cumsum: 按您的关键列分组并致电cumsum:
df['CumSum'] = df.groupby(['F(1)', 'F(2)', 'F(3)'])['Amount'].cumsum()
df
Out:
F(1) F(2) F(3) Amount CumSum
0 A B C 100 100
1 A B C 100 200
2 A B C 100 300
3 D E F 300 300
4 D E F 150 450
5 G H I 100 100
6 G H I 200 300
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.