[英]Datetime upsampling
I have a dataframe like such: 我有一个像这样的数据框:
rows = [['bob', '01/2017', 12],
['bob', '02/2017', 14],
['bob', '03/2017', 16],
['julia', '01/2017', 18],
['julia', '02/2017', 16],
['julia', '03/2017', 24]]
df = pd.DataFrame(rows, columns = ['name','date','val'])
Assuming that each month has four weeks (i will use a lookup to match month to num weeks, but for simplicity assume 4), I want to create a row for each person for each week of the month where the value is the months value divided by 4 (or n_weeks). 假设每个月有四个星期(我将使用查找将月份与num个星期进行匹配,但为简单起见假设为4个),我想为该月的每个星期为每个人创建一行,其中值是月值除以4(或n_weeks)。
I tried using .resample()
and .asfreq()
but they told me I needed a unique index. 我尝试使用.resample()
和.asfreq()
但是他们告诉我我需要一个唯一的索引。
How can I do this in pandas? 如何在熊猫中做到这一点?
EDIT 编辑
Ok so i got this: 好的,所以我得到了:
weekly = df.groupby('name').apply(lambda g: g.set_index('date').resample('w').pad().reset_index()).reset_index(drop=True)
weekly.val/4
date name val
0 2017-01-01 bob 3
1 2017-01-08 bob 3
2 2017-01-15 bob 3
3 2017-01-22 bob 3
4 2017-01-29 bob 3
5 2017-02-05 bob 3.5
6 2017-02-12 bob 3.5
7 2017-02-19 bob 3.5
8 2017-02-26 bob 3.5
9 2017-03-05 bob 4
10 2017-01-01 julia 4.5
11 2017-01-08 julia 4.5
12 2017-01-15 julia 4.5
13 2017-01-22 julia 4.5
14 2017-01-29 julia 4.5
15 2017-02-05 julia 4
16 2017-02-12 julia 4
17 2017-02-19 julia 4
18 2017-02-26 julia 4
19 2017-03-05 julia 6
My problem is still that it's not distributing the last month of each group. 我的问题仍然是,它没有分配每个组的最后一个月。
So someone answered this partially but then deleted it before I could copy it, but I think i figured out what they were going for: 所以有人对此作了部分回答,但是在我可以复制它之前将其删除,但是我想我知道了他们要做什么:
So from this dataframe (created in the question) 所以从这个数据帧(在问题中创建)
name date val
0 bob 01/2017 12
1 bob 02/2017 14
2 bob 03/2017 16
3 julia 01/2017 18
4 julia 02/2017 16
5 julia 03/2017 24
I can do: 我可以:
from pandas.tseries.offsets import *
df['date']=pd.to_datetime(df.date)
min_date = df.date.min()+MonthBegin(0)
max_date = df.date.max()+MonthEnd(0)
dr = pd.date_range(min_date, max_date,freq='w')
weekly = df.groupby('name').apply(lambda g: g.set_index('date')
.reindex(dr,method='pad').reset_index()).reset_index(drop=True)
and get 并得到
index name val
0 2017-01-01 bob 12
1 2017-01-08 bob 12
2 2017-01-15 bob 12
3 2017-01-22 bob 12
4 2017-01-29 bob 12
5 2017-02-05 bob 14
6 2017-02-12 bob 14
7 2017-02-19 bob 14
8 2017-02-26 bob 14
9 2017-03-05 bob 16
10 2017-03-12 bob 16
11 2017-03-19 bob 16
12 2017-03-26 bob 16
13 2017-01-01 julia 18
14 2017-01-08 julia 18
15 2017-01-15 julia 18
16 2017-01-22 julia 18
17 2017-01-29 julia 18
18 2017-02-05 julia 16
19 2017-02-12 julia 16
20 2017-02-19 julia 16
21 2017-02-26 julia 16
22 2017-03-05 julia 24
23 2017-03-12 julia 24
24 2017-03-19 julia 24
25 2017-03-26 julia 24
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.