简体   繁体   English

日期时间上采样

[英]Datetime upsampling

I have a dataframe like such: 我有一个像这样的数据框:

rows = [['bob', '01/2017', 12],
        ['bob', '02/2017', 14],
        ['bob', '03/2017', 16],
        ['julia', '01/2017', 18],
        ['julia', '02/2017', 16],
        ['julia', '03/2017', 24]]

df = pd.DataFrame(rows, columns = ['name','date','val'])

Assuming that each month has four weeks (i will use a lookup to match month to num weeks, but for simplicity assume 4), I want to create a row for each person for each week of the month where the value is the months value divided by 4 (or n_weeks). 假设每个月有四个星期(我将使用查找将月份与num个星期进行匹配,但为简单起见假设为4个),我想为该月的每个星期为每个人创建一行,其中值是月值除以4(或n_weeks)。

I tried using .resample() and .asfreq() but they told me I needed a unique index. 我尝试使用.resample().asfreq()但是他们告诉我我需要一个唯一的索引。

How can I do this in pandas? 如何在熊猫中做到这一点?

EDIT 编辑

Ok so i got this: 好的,所以我得到了:

weekly = df.groupby('name').apply(lambda g: g.set_index('date').resample('w').pad().reset_index()).reset_index(drop=True)

weekly.val/4

    date    name    val
0   2017-01-01  bob 3
1   2017-01-08  bob 3
2   2017-01-15  bob 3
3   2017-01-22  bob 3
4   2017-01-29  bob 3
5   2017-02-05  bob 3.5
6   2017-02-12  bob 3.5
7   2017-02-19  bob 3.5
8   2017-02-26  bob 3.5
9   2017-03-05  bob 4
10  2017-01-01  julia   4.5
11  2017-01-08  julia   4.5
12  2017-01-15  julia   4.5
13  2017-01-22  julia   4.5
14  2017-01-29  julia   4.5
15  2017-02-05  julia   4
16  2017-02-12  julia   4
17   2017-02-19 julia   4
18  2017-02-26  julia   4
19  2017-03-05  julia   6

My problem is still that it's not distributing the last month of each group. 我的问题仍然是,它没有分配每个组的最后一个月。

So someone answered this partially but then deleted it before I could copy it, but I think i figured out what they were going for: 所以有人对此作了部分回答,但是在我可以复制它之前将其删除,但是我想我知道了他们要做什么:

So from this dataframe (created in the question) 所以从这个数据帧(在问题中创建)

    name    date    val
0   bob 01/2017 12
1   bob 02/2017 14
2   bob 03/2017 16
3   julia   01/2017 18
4   julia   02/2017 16
5   julia   03/2017 24

I can do: 我可以:

from pandas.tseries.offsets import *
df['date']=pd.to_datetime(df.date)

min_date = df.date.min()+MonthBegin(0)
max_date = df.date.max()+MonthEnd(0)
dr = pd.date_range(min_date, max_date,freq='w')

weekly = df.groupby('name').apply(lambda g: g.set_index('date')
         .reindex(dr,method='pad').reset_index()).reset_index(drop=True)

and get 并得到

    index      name val
0   2017-01-01  bob 12
1   2017-01-08  bob 12
2   2017-01-15  bob 12
3   2017-01-22  bob 12
4   2017-01-29  bob 12
5   2017-02-05  bob 14
6   2017-02-12  bob 14
7   2017-02-19  bob 14
8   2017-02-26  bob 14
9   2017-03-05  bob 16
10  2017-03-12  bob 16
11  2017-03-19  bob 16
12  2017-03-26  bob 16
13  2017-01-01  julia   18
14  2017-01-08  julia   18
15  2017-01-15  julia   18
16  2017-01-22  julia   18
17  2017-01-29  julia   18
18  2017-02-05  julia   16
19  2017-02-12  julia   16
20  2017-02-19  julia   16
21  2017-02-26  julia   16
22  2017-03-05  julia   24
23  2017-03-12  julia   24
24  2017-03-19  julia   24
25  2017-03-26  julia   24

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM