简体   繁体   English

使用前向数据重新采样时间序列熊猫

[英]Resampling a timeseries pandas with forward data

My 30min df is like below:我的 30 分钟 df 如下所示:

                         open      high      low     close    volume
t
2020-08-24 09:30:00  514.7900  515.1400  502.240  507.3700  12123388
2020-08-24 10:00:00  507.3200  513.9800  500.000  502.8899   6652496
2020-08-24 10:30:00  502.8190  503.7700  495.745  496.4879   5925417
2020-08-24 11:00:00  496.7865  504.4000  495.750  501.3500   4460389
2020-08-24 11:30:00  501.3400  508.6300  501.250  508.0800   3743261
2020-08-24 12:00:00  508.1100  514.7809  506.550  507.7000   3415871
2020-08-24 12:30:00  507.7000  507.9000  504.240  504.8050   2864729
2020-08-24 13:00:00  504.7250  508.0000  504.000  505.1700   2374089
2020-08-24 13:30:00  505.1707  506.7220  503.120  506.0150   2207964
2020-08-24 14:00:00  506.0700  507.0800  503.670  504.1742   2227575
2020-08-24 14:30:00  504.1800  514.6800  501.100  501.7300   2676025
2020-08-24 15:00:00  501.7100  503.4200  498.620  503.2265   3971955
2020-08-24 15:30:00  503.2330  504.5150  501.546  503.7900   4239235

I am using the resample method for hourly data.我正在对每小时数据使用重新采样方法。 And agg for finding open and close values, high and low values, also volume.以及用于查找开盘价和收盘价、最高价和最低价以及交易量的 agg。

df = df.resample('H', loffset='30Min').agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum'})

Gives me:给我:

                         open      high      low     close    volume
t
2020-08-24 09:30:00  512.7500  515.9800  502.240  507.3700  12628715
2020-08-24 10:30:00  507.3200  513.9800  495.745  496.4879  12577913
2020-08-24 11:30:00  496.7865  508.6300  495.750  508.0800   8203650
2020-08-24 12:30:00  508.1100  514.7809  504.240  504.8050   6280600
2020-08-24 13:30:00  504.7250  508.0000  503.120  506.0150   4582053
2020-08-24 14:30:00  506.0700  514.6800  501.100  501.7300   4903600
2020-08-24 15:30:00  501.7100  504.5150  498.620  503.7900   8211190

df.resample is taking 10:00 and 10:30 data and creating the row as 10:30 data. df.resample 取 10:00 和 10:30 数据并将行创建为 10:30 数据。

For ex newly generated row: 2020-08-24 10:30:00 507.3200 513.9800 495.745 496.4879 12577913对于前新生成的行:2020-08-24 10:30:00 507.3200 513.9800 495.745 496.4879 12577913

507.32 open price is 2020-08-24 10:00:00 's price. 507.32 开盘价为 2020-08-24 10:00:00 的价格。 Should be matched like below image应该像下图一样匹配

在此处输入图片说明

The desired df should be like below: As seen all 2 times merged except 15:30:00 data.所需的 df 应如下所示: 如所见,除 15:30:00 数据外,所有 2 次合并。

                         open      high      low     close    volume
t
2020-08-24 09:30:00  514.7900  515.1400  500.000  502.8899  18775884
2020-08-24 10:30:00  502.8190  504.4000  495.745  501.3500  10385806
2020-08-24 11:30:00  501.3400  514.7809  501.250  507.7000   7159132
2020-08-24 12:30:00  507.7000  508.0000  504.000  505.1700   5238818
2020-08-24 13:30:00  505.1707  507.0800  503.120  504.1742   4435539
2020-08-24 14:30:00  504.1800  514.6800  498.620  503.2265   6647980
2020-08-24 15:30:00  503.2330  504.5150  501.546  503.7900   4239235

Any pseudo code will be help, thank you任何伪代码都会有所帮助,谢谢

You should use parameter offset in method pd.resample instead of loffset :您应该在方法pd.resample使用参数offset而不是loffset

df2 = df.resample('1H', offset='30Min').agg({'open': 'first', 
                                       'high': 'max', 
                                       'low': 'min', 
                                       'close': 'last',
                                       'volume': 'sum'})

BTW loffset is deprecated since version 1.1.0. BTW loffset自 1.1.0 版loffset已弃用。 Update of pandas may be needed.可能需要更新熊猫。

Result df2 :结果df2

                         open      high      low     close    volume
t                                                                   
2020-08-24 09:30:00  514.7900  515.1400  500.000  502.8899  18775884
2020-08-24 10:30:00  502.8190  504.4000  495.745  501.3500  10385806
2020-08-24 11:30:00  501.3400  514.7809  501.250  507.7000   7159132
2020-08-24 12:30:00  507.7000  508.0000  504.000  505.1700   5238818
2020-08-24 13:30:00  505.1707  507.0800  503.120  504.1742   4435539
2020-08-24 14:30:00  504.1800  514.6800  498.620  503.2265   6647980
2020-08-24 15:30:00  503.2330  504.5150  501.546  503.7900   4239235

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM