[英]pandas intraday 8Min resample bug?
It seems that for 1Min bar data, resample() with sampling frequency of any multiple of 8 has a bug. 似乎对于1Min条数据,采样频率为8的任意倍数的resample()都有一个错误。 The code below illustrates the bug when resampling is done at [3, 5, 6, 8, 16] Min.
下面的代码说明了在[3、5、6、8、16]分钟进行重新采样时的错误。 For both 3 and 5 frequency, the first entry of the resampled dataframe index starts at the base timestamp (9:30 in this case) while for frequencies 8 and 16, the resampled index starts at 9:26 and 9:18 respectively.
对于3和5频率,重新采样的数据帧索引的第一项始于基本时间戳(在本例中为9:30),而对于频率8和16,重新采样的索引分别于9:26和9:18开始。
import pandas as pd
import datetime as dt
import numpy as np
datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 0)
tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])
for freq in [3, 5, 6, 8, 16]:
print freq
print df.resample(str(freq) + 'Min', how='first', base=30).head(2)
Produces the following output: 产生以下输出:
3
A
2014-09-01 09:30:00 0
2014-09-01 09:33:00 3
5
A
2014-09-01 09:30:00 0
2014-09-01 09:35:00 5
6
A
2014-09-01 09:30:00 0
2014-09-01 09:36:00 6
8
A
2014-09-01 09:26:00 0
2014-09-01 09:34:00 4
16
A
2014-09-01 09:18:00 0
2014-09-01 09:34:00 4
I think resample is base on 00:00:00 so I using offset index to 00:00 then resample. 我认为重采样基于00:00:00,因此我将偏移量索引设置为00:00,然后重新采样。
method 1 方法1
import pandas as pd
import datetime as dt
import numpy as np
datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 30)
tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])
offsets = pd.offsets.Hour(9) + pd.offsets.Minute(30)
for freq in [1,3,5,6,8, 16]:
print(freq)
df.index = df.index - offsets
df = df.resample(str(freq) + 'T').agg({'A':'first'})
df.index = df.index + offsets
print(df.head(2))
method 2 : using base like index offsets. 方法2:使用基本索引偏移量。
import pandas as pd
import datetime as dt
import numpy as np
datetime_start = dt.datetime(2014, 9, 1, 9, 30)
datetime_end = dt.datetime(2014, 9, 1, 16, 30)
tt = pd.date_range(datetime_start, datetime_end, freq='1Min')
df = pd.DataFrame(np.arange(len(tt)), index=tt, columns=['A'])
for freq in [1,3,5,6,8, 16]:
print(freq)
df = df.resample(str(freq) + 'T',base=9*60+30).agg({'A':'first'})
print(df.head(2))
then output 然后输出
1
A
2014-09-01 09:30:00 0
2014-09-01 09:31:00 1
3
A
2014-09-01 09:30:00 0
2014-09-01 09:33:00 3
5
A
2014-09-01 09:30:00 0
2014-09-01 09:35:00 6
6
A
2014-09-01 09:30:00 0
2014-09-01 09:36:00 12
8
A
2014-09-01 09:30:00 0
2014-09-01 09:38:00 15
16
A
2014-09-01 09:30:00 0
2014-09-01 09:46:00 21
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.