[英]using resample to aggregate data with different rules for different columns in a pandas dataframe
I have a dataframe of the classic "open high low close volume" data type, so common in finance.我有一个经典的“高开低收”数据类型的数据框,在金融中很常见。 With each row being 1 minute.
每行为 1 分钟。 720 rows.
720 行。 I gather it with this code from Kraken:
我用 Kraken 的这个代码收集它:
import urllib.request, json
with urllib.request.urlopen("https://api.kraken.com/0/public/OHLC?pair=XXBTZEUR&interval=1") as url:
data = json.loads(url.read().decode())
columns=['time', 'open', 'high', 'low', 'close', 'vwap', 'volume', 'ount']
data_DF=pd.DataFrame(data['result']['XXBTZEUR'],columns=columns)
data_DF['open']=data_DF['open'].astype(float)
data_DF['high']=data_DF['high'].astype(float)
data_DF['low']=data_DF['low'].astype(float)
data_DF['close']=data_DF['close'].astype(float)
data_DF['volume']=data_DF['volume'].astype(float)
data_DF['vwap']=data_DF['vwap'].astype(float)
data_DF['ount']=data_DF['ount'].astype(int)
data_DF['time']=pd.to_datetime(data_DF['time'],unit='s')
data_DF.set_index('time',inplace=True)
I now need to aggregate it for different time periods.我现在需要将它汇总到不同的时间段。 To keep things simple let us suppose just the classic 5 minutes.
为了简单起见,让我们假设只有经典的 5 分钟。 Each column must be generated according to a different rule:
每列必须根据不同的规则生成:
The open column must be the first falue of the open column values of the sample;开列必须是样本开列值的第一个错误;
The close column must be the last value of the close column values of the sample;关闭列必须是样本关闭列值的最后一个值;
the high must be the max of the high column values of the sample; high 必须是样本的 high 列值的最大值;
the low must be the min of the low column values of the sample;低必须是样本低列值的最小值;
I tried我试过
data_DF5=data_DF['vwap'].resample('5Min').OHLC()
but it creates a series of open high low close for each column.但它为每列创建了一系列开盘高低收盘价。 Hmm, not what I was looking for.
嗯,不是我要找的。
I tried:我试过:
data_DF5=data_DF['time'].resample('5Min')
data_DF5['volume']=data_DF['volume'].resample('5Min').sum()
data_DF5['open']=data_DF['open'].resample('5Min').first()
data_DF5['close']=data_DF['close'].resample('5Min').last()
data_DF5['high']=data_DF['high'].resample('5Min').max()
data_DF5['low']=data_DF['low'].resample('5Min').min()
With the intent of building the dataframe one column at a time.旨在一次构建一列数据框。
And I get a我得到一个
"Unable to open 'hashtable_class_helper.pxi': File not found " error which I cannot understand.
“无法打开‘hashtable_class_helper.pxi’:找不到文件”错误,我无法理解。 If I change the first line with
如果我改变第一行
data_DF5=data_DF['vwap'].resample('5Min').mean()
I get a dataframe which I cannot even interpret [see (*)].我得到了一个我什至无法解释的数据框 [参见 (*)]。
And if I change the first line with如果我改变第一行
data_DF5=data_DF['vwap'].resample('5Min')
I get:我得到:
'DatetimeIndexResampler' object does not support item assignment.
'DatetimeIndexResampler' 对象不支持项目分配。
I am really at a loss.我真的很茫然。 I have looked for stackoverflow other questions, but none seem to cover this case.
我一直在寻找 stackoverflow 的其他问题,但似乎没有一个涵盖这种情况。 Also the manual page does not seem to be clear on how to solve this.
手册页似乎也不清楚如何解决这个问题。
(*) (*)
2018-12-29 07:05:00 3417.8 2018-12-29 07:10:00 3411.12 2018-12-29 07:15:00 3408.98 2018-12-29 07:20:00 3409.46 2018-12-29 07:25:00 3409.26 2018-12-29 07:30:00 2729.18 2018-12-29 07:35:00 3413.9 2018-12-29 07:40:00 2739.32 2018-12-29 07:45:00 3426.12 2018-12-29 07:50:00 3423.46 2018-12-29 07:55:00 3433.22 2018-12-29 08:00:00 3424.14 2018-12-29 08:05:00 3426.44 2018-12-29 08:10:00 3424.6 2018-12-29 08:15:00 3425.22 2018-12-29 08:20:00 3425.6 2018-12-29 08:25:00 3425.72 2018-12-29 08:30:00 3427.96 2018-12-29 08:35:00 3427.64 2018-12-29 08:40:00 3427.06 2018-12-29 08:45:00 3426.06 2018-12-29 08:50:00 3423.38 2018-12-29 08:55:00 3426.42 2018-12-29 09:00:00 3441.08 2018-12-29 09:05:00 3439.68 2018-12-29 09:10:00 3429.38 2018-12-29 09:15:00 3422.12 2018-12-29 09:20:00 3418.4 2018-12-29 09:25:00 3419 2018-12-29 09:30:00
2018-12-29 07:05:00 3417.8 2018-12-29 07:10:00 3411.12 2018-12-29 07:15:00 3408.98 2018-12-29-09-018.98-2018-12-209.0707 :25:00 3409.26 2018-12-29 07:30:00 2729.18 2018-12-29 07:35:00 3413.9 2018-12-29 07:40:00 2729.18-12-29 07:40:00 273492:07:35:00 -12-29 07:50:00 3423.46 2018-12-29 07:55:00 3433.22 2018-12-29 08:00:00 3424.14 2018-12-29 08:34208:04-08:0420 10:00 3424.6 2018-12-29 08:15:00 3425.22 2018-12-29 08:20:00 3425.6 2018-12-29 08:25:00 3425-12-29 08:25:00 3425-12-29 08:20:00 3425-72 08:00 12-29 08:35:00 3427.64 2018-12-29 08:40:00 3427.06 2018-12-29 08:45:00 3426.06 2018-12-29 08:3820 08:3820:50 3427.06 :00 3426.42 2018-12-29 09:00:00 3441.08 2018-12-29 09:05:00 3439.68 2018-12-29 09:10:00 3429-12-20 3429-12-29 09:10:00 3429-12-29 3439.68 -29 09:20:00 3418.4 2018-12-29 09:25:00 3419 2018-12-29 09:30:00
3415.94 ... 2018-12-29 17:05:00 3363.46 2018-12-29 17:10:00 3364.86 2018-12-29 17:15:00 3362.56 2018-12-29 17:20:00 3360.88 2018-12-29 17:25:00 3358.98 2018-12-29 17:30:00 3353.8 2018-12-29 17:35:00 3371.62 2018-12-29 17:40:00 3365.38 2018-12-29 17:45:00 3368.76 2018-12-29 17:50:00 3373.82 2018-12-29 17:55:00 3373.32 2018-12-29 18:00:00 3374.78 2018-12-29 18:05:00 3372.56 2018-12-29 18:10:00 3370.3 2018-12-29 18:15:00 3370.3 2018-12-29 18:20:00 3371.36 2018-12-29 18:25:00 3372.14 2018-12-29 18:30:00 3367.36 2018-12-29 18:35:00 3371.3 2018-12-29 18:40:00 3367.08 2018-12-29 18:45:00 3363.3 2018-12-29 18:50:00 3357.66 2018-12-29 18:55:00 3357.64 2018-12-29 19:00:00 3357.64 2018-12-29 19:05:00 3356 volume time 2018-12-29 07:05:00 0.112311 2018-12-... open time 2018-12-29 07:05:00 3418.9 2018-12-29 ... close time 2018-12-29 07:05:003415.94 ... 2018-12-29 17:05:00 3363.46 2018-12-29 17:10:00 3364.86 2018-12-29 17:15:00 3362.56-0808-2820 2018-12-29 17:10:00 3364.86 12-29 17:25:00 3358.98 2018-12-29 17:30:00 3353.8 2018-12-29 17:35:00 3371.62 2018-12-29 17:32250-17:33620.8-18:40:00 3353.8 :00 3368.76 2018-12-29 17:50:00 3373.82 2018-12-29 17:55:00 3373.32 2018-12-29 18:00:00 3374.82-29 18:00:00 3374.82-29 18:00 3374.82-29 18:00 -29 18:10:00 3370.3 2018-12-29 18:15:00 3370.3 2018-12-29 18:20:00 3371.36 2018-12-29 18:25:00.14137:25:00.14132 00 3367.36 2018-12-29 18:35:00 3371.3 2018-12-29 18:40:00 3367.08 2018-12-29 18:45:00 3363.3-12-29 18:40:00 3363.3-08-18-2018-18-18-2018 29 18:55:00 3357.64 2018-12-29 19:00:00 3357.64 2018-12-29 19:05:00 3356 卷时间 2018-12-29 07:05:02-12-1356 开..时间 2018-12-29 07:05:00 3418.9 2018-12-29 ... 关闭时间 2018-12-29 07:05:00
3416.8 2018-12-29 ... high time 2018-12-29 07:05:00 3418.9 2018-12-29 ... low time 2018-12-29 07:05:00 3416.8 2018-12-29 ... Name: vwap, Length: 150, dtype: object3416.8 2018-12-29 ... 高时 2018-12-29 07:05:00 3418.9 2018-12-29 ... 低时 2018-12-29 07:05:00 3416.8-2018-.12 . 名称:vwap,长度:150,dtype:对象
I think you need pd.Grouper
我想你需要
pd.Grouper
data_DF = data_DF.groupby(pd.Grouper(freq='5min')).agg({'open':'first',
'close':'last',
'high':'max',
'low':'min'})
open close high low
time
2018-12-29 07:30:00 3411.4 3413.9 3413.9 3411.4
2018-12-29 07:35:00 3413.9 3413.1 3416.1 3411.9
2018-12-29 07:40:00 3413.1 3422.9 3427.5 3413.1
2018-12-29 07:45:00 3421.1 3423.8 3431.7 3418.0
2018-12-29 07:50:00 3423.8 3428.2 3428.2 3418.9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.