[英]Cumulative sum over days in python
I have the following dataframe: 我有以下数据帧:
date money
0 2018-01-01 20
1 2018-01-05 30
2 2018-02-15 7
3 2019-03-17 150
4 2018-01-05 15
...
2530 2019-03-17 350
And I need: 我需要:
[(2018-01-01,20),(2018-01-05,65),(2018-02-15,72),...,(2019-03-17,572)]
So i need to do a cumulative sum of money over all days: So far I have tried many things and the closest Ithink I've got is: 因此,我需要在所有日子里累积一笔钱:到目前为止,我已经尝试了很多东西,而我最接近的Ithink是:
graph_df.date = pd.to_datetime(graph_df.date)
temporary = graph_df.groupby('date').money.sum()
temporary = temporary.groupby(temporary.index.to_period('date')).cumsum().reset_index()
But this gives me ValueError: Invalid frequency: date 但这给了我ValueError:无效的频率:日期
Could anyone help please? 有人可以帮忙吗?
Thanks 谢谢
I don't think you need the second groupby. 我认为你不需要第二组。 You can simply add a column with the cumulative sum. 您只需添加累积总和的列即可。
This does the trick for me: 这对我有用:
import pandas as pd
df = pd.DataFrame({'date': ['01-01-2019','04-06-2019', '07-06-2019'], 'money': [12,15,19]})
df['date'] = pd.to_datetime(df['date']) # this is not strictly needed
tmp = df.groupby('date')['money'].sum().reset_index()
tmp['money_sum'] = tmp['money'].cumsum()
Converting the date column to an actual date is not needed for this to work. 为此,不需要将日期列转换为实际日期。
list(map(tuple, df.groupby('date', as_index=False)['money'].sum().values))
Edit : 编辑 :
df = pd.DataFrame({'date': ['2018-01-01', '2018-01-05', '2018-02-15', '2019-03-17', '2018-01-05'],
'money': [20, 30, 7, 150, 15]})
#df['date'] = pd.to_datetime(df['date'])
#df = df.sort_values(by='date')
temporary = df.groupby('date', as_index=False)['money'].sum()
temporary['money_cum'] = temporary['money'].cumsum()
Result: 结果:
>>> list(map(tuple, temporary[['date', 'money_cum']].values))
[('2018-01-01', 20),
('2018-01-05', 65),
('2018-02-15', 72),
('2019-03-17', 222)]
you can try using df.groupby('date').sum()
: 你可以尝试使用df.groupby('date').sum()
:
example data frame: 示例数据框:
df
date money
0 01/01/2018 20
1 05/01/2018 30
2 15/02/2018 7
3 17/03/2019 150
4 05/01/2018 15
5 17/03/2019 550
6 15/02/2018 13
df['cumsum'] = df.money.cumsum()
list(zip(df.groupby('date').tail(1)['date'], df.groupby('date').tail(1)['cumsum']))
[('01/01/2018', 20),
('05/01/2018', 222),
('17/03/2019', 772),
('15/02/2018', 785)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.