简体   繁体   English

熊猫盘中8Min重采样错误?

[英]pandas intraday 8Min resample bug?

It seems that for 1Min bar data, resample() with sampling frequency of any multiple of 8 has a bug. 似乎对于1Min条数据,采样频率为8的任意倍数的resample()都有一个错误。 The code below illustrates the bug when resampling is done at [3, 5, 6, 8, 16] Min. 下面的代码说明了在[3、5、6、8、16]分钟进行重新采样时的错误。 For both 3 and 5 frequency, the first entry of the resampled dataframe index starts at the base timestamp (9:30 in this case) while for frequencies 8 and 16, the resampled index starts at 9:26 and 9:18 respectively. 对于3和5频率,重新采样的数据帧索引的第一项始于基本时间戳(在本例中为9:30),而对于频率8和16,重新采样的索引分别于9:26和9:18开始。

import pandas as pd
import datetime as dt
import numpy as np

datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 0)

tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])

for freq in [3, 5, 6, 8, 16]:
    print freq
    print df.resample(str(freq) + 'Min', how='first', base=30).head(2)

Produces the following output: 产生以下输出:

3
                     A
2014-09-01 09:30:00  0
2014-09-01 09:33:00  3
5
                     A
2014-09-01 09:30:00  0
2014-09-01 09:35:00  5
6
                     A
2014-09-01 09:30:00  0
2014-09-01 09:36:00  6
8
                     A
2014-09-01 09:26:00  0
2014-09-01 09:34:00  4
16
                     A
2014-09-01 09:18:00  0
2014-09-01 09:34:00  4

I think resample is base on 00:00:00 so I using offset index to 00:00 then resample. 我认为重采样基于00:00:00,因此我将偏移量索引设置为00:00,然后重新采样。

method 1 方法1

import pandas as pd
import datetime as dt
import numpy as np

datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 30)

tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])

offsets = pd.offsets.Hour(9) + pd.offsets.Minute(30)
for freq in [1,3,5,6,8, 16]:
    print(freq)
    df.index = df.index - offsets
    df = df.resample(str(freq) + 'T').agg({'A':'first'})
    df.index = df.index + offsets
    print(df.head(2))

method 2 : using base like index offsets. 方法2:使用基本索引偏移量。

import pandas as pd
import datetime as dt
import numpy as np

datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 30)

tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])

for freq in [1,3,5,6,8, 16]:
    print(freq)
    df = df.resample(str(freq) + 'T',base=9*60+30).agg({'A':'first'})
    print(df.head(2))

then output 然后输出

1
                     A
2014-09-01 09:30:00  0
2014-09-01 09:31:00  1
3
                     A
2014-09-01 09:30:00  0
2014-09-01 09:33:00  3
5
                     A
2014-09-01 09:30:00  0
2014-09-01 09:35:00  6
6
                      A
2014-09-01 09:30:00   0
2014-09-01 09:36:00  12
8
                      A
2014-09-01 09:30:00   0
2014-09-01 09:38:00  15
16
                      A
2014-09-01 09:30:00   0
2014-09-01 09:46:00  21

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM