简体   繁体   English

用熊猫重新采样 OHLC 数据

[英]Resample OHLC data with pandas

There are a lot of similar questions, all of them with they specific issues and answers, but I haven't found a fitting solution, nor an understanding on how to do it.很多类似的问题,所有这些问题都有特定的问题和答案,但我还没有找到合适的解决方案,也不了解如何去做。

I have typical data:我有典型的数据:

date        open    high    low     close   volume      spot
1507842000  5313.3  5345.6  5272    5295.1  22612561    5301.462201
1507845600  5295.1  5326.7  5286.1  5301.1  12127159    5308.487754
1507849200  5301.1  5467.5  5301.1  5464.5  54568881    5401.331605
1507852800  5464.7  5497    5394.9  5402.5  58411322    5446.552171
1507856400  5402.1  5542    5402.1  5541.2  50272286    5466.652636
1507860000  5540.4  5980    5440.1  5694.5  182746217   5717.856124
1507863600  5689.8  5800    5604.5  5739.6  78341266    5709.488508
1507867200  5742    5897    5713.1  5753.2  79738461    5794.402674
1507870800  5753.1  5798.9  5520.3  5574.5  87621428    5640.727381
1507874400  5574.6  5672.6  5503.2  5608.4  56964404    5591.237093
1507878000  5607.5  5689.1  5570    5660    46132190    5640.761482
1507881600  5660    5743    5634.8  5652    50173714    5690.219952

but not just OHLC, but also volume and spot price.但不仅仅是 OHLC,还有数量和现货价格。

I am trying to resample hours to days.我正在尝试重新采样数小时到数天。

so, I load the csv:所以,我加载了 csv:

data_hourly = pd.read_csv('../data/hourly.csv', parse_dates=True, date_parser=date_parse, index_col=0, header=0)

(the date_parse function is removing the minutes / seconds) (date_parse 函数正在删除分钟/秒)

I tried:我试过:

data_daily = data_hourly.resample('1D').ohlc()

and, this clearly doesn't work at all;而且,这显然根本行不通; giving me rows with a large amount of columns.给我包含大量列的行。

and I tried:我试过:

columns_dict = {'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum', 'spot': 'average'}

data_daily = data_hourly.resample('1D', how=columns_dict) data_daily = data_hourly.resample('1D', how=columns_dict)

but this crashes with an error:但这会因错误而崩溃:

"%r object has no attribute %r" % (type(self). name , attr) AttributeError: 'SeriesGroupBy' object has no attribute 'average' “%r 对象没有属性 %r” % (type(self). name , attr) AttributeError: 'SeriesGroupBy' 对象没有属性 'average'

besides, it tells me the 'how' field is deprecated anyways, but I didn't see a sample to do it the 'new' way.此外,它告诉我无论如何都不推荐使用“如何”字段,但我没有看到以“新”方式执行此操作的示例。

You are close, need mean instead average and pass it to Resampler.agg :您很接近,需要mean而不是average并将其传递给Resampler.agg

columns_dict = {'open': 'first', 'high': 'max', 'low': 'min', 
               'close': 'last', 'volume': 'sum', 'spot': 'mean'}
data_daily = data_hourly.resample('1D').agg(columns_dict)
print (data_daily)
              open    high     low   close     volume         spot
date                                                              
2017-10-12  5313.3  5467.5  5272.0  5464.5   89308601  5337.093853
2017-10-13  5464.7  5980.0  5394.9  5652.0  690401288  5633.099780

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM