[英]How to resample weekly data from daily data with groupby in pandas?
[英]How to groupby and resample data in pandas?
我有不同日期不同客户的销售数据。 但是日期不是连续的,我想将数据重新采样为每日频率。 我怎样才能做到这一点?
import numpy as np
import pandas as pd
df = pd.DataFrame({'id': list('aababcbc'),
'date': pd.date_range('2022-01-01',periods=8),
'value':range(8)}).sort_values('id')
df
id date value
0 a 2022-01-01 0
1 a 2022-01-02 1
3 a 2022-01-04 3
2 b 2022-01-03 2
4 b 2022-01-05 4
6 b 2022-01-07 6
5 c 2022-01-06 5
7 c 2022-01-08 7
所需的 output 如下
id date value
a 2022-01-01 0
a 2022-01-02 1
a 2022-01-03 0 ** there is no data for a in this day
a 2022-01-04 3
b 2022-01-03 2
b 2022-01-04 0 ** there is no data for b in this day
b 2022-01-05 4
b 2022-01-06 0 ** there is no data for b in this day
b 2022-01-07 6
c 2022-01-06 5
c 2022-01-07 0 ** there is no data for c in this day
c 2022-01-08 7
df.groupby(['id']).resample('D',on='date')['value'].sum().reset_index()
df["date"] = pd.to_datetime(df["date"])
df.set_index("date").groupby("id").resample("1d").sum()
def f(df):
return df.resample('D', on='date')['value'].sum()
df.groupby(['id']).apply(f).reset_index()
产生:
id date value
0 a 2022-01-01 0
1 a 2022-01-02 1
2 a 2022-01-03 0
3 a 2022-01-04 3
4 b 2022-01-03 2
5 b 2022-01-04 0
6 b 2022-01-05 4
7 b 2022-01-06 0
8 b 2022-01-07 6
9 c 2022-01-06 5
10 c 2022-01-07 0
11 c 2022-01-08 7
这是我想出的解决方案:
df.groupby(['id']).apply(lambda x: x.resample('D',on='date')['value'].sum()).reset_index()
id date value
0 a 2022-01-01 0
1 a 2022-01-02 1
2 a 2022-01-03 0
3 a 2022-01-04 3
4 b 2022-01-03 2
5 b 2022-01-04 0
6 b 2022-01-05 4
7 b 2022-01-06 0
8 b 2022-01-07 6
9 c 2022-01-06 5
10 c 2022-01-07 0
11 c 2022-01-08 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.