简体   繁体   English

使用 resample 为 Pandas 数据框中的不同列聚合具有不同规则的数据

[英]using resample to aggregate data with different rules for different columns in a pandas dataframe

I have a dataframe of the classic "open high low close volume" data type, so common in finance.我有一个经典的“高开低收”数据类型的数据框,在金融中很常见。 With each row being 1 minute.每行为 1 分钟。 720 rows. 720 行。 I gather it with this code from Kraken:我用 Kraken 的这个代码收集它:

import urllib.request, json 

with urllib.request.urlopen("https://api.kraken.com/0/public/OHLC?pair=XXBTZEUR&interval=1") as url:
    data = json.loads(url.read().decode())

columns=['time', 'open', 'high', 'low', 'close', 'vwap', 'volume', 'ount']
data_DF=pd.DataFrame(data['result']['XXBTZEUR'],columns=columns)
data_DF['open']=data_DF['open'].astype(float)
data_DF['high']=data_DF['high'].astype(float)
data_DF['low']=data_DF['low'].astype(float)
data_DF['close']=data_DF['close'].astype(float)
data_DF['volume']=data_DF['volume'].astype(float)
data_DF['vwap']=data_DF['vwap'].astype(float)
data_DF['ount']=data_DF['ount'].astype(int)
data_DF['time']=pd.to_datetime(data_DF['time'],unit='s')
data_DF.set_index('time',inplace=True)

I now need to aggregate it for different time periods.我现在需要将它汇总到不同的时间段。 To keep things simple let us suppose just the classic 5 minutes.为了简单起见,让我们假设只有经典的 5 分钟。 Each column must be generated according to a different rule:每列必须根据不同的规则生成:
The open column must be the first falue of the open column values of the sample;开列必须是样本开列值的第一个错误;
The close column must be the last value of the close column values of the sample;关闭列必须是样本关闭列值的最后一个值;
the high must be the max of the high column values of the sample; high 必须是样本的 high 列值的最大值;
the low must be the min of the low column values of the sample;低必须是样本低列值的最小值;

I tried我试过

data_DF5=data_DF['vwap'].resample('5Min').OHLC()

but it creates a series of open high low close for each column.但它为每列创建了一系列开盘高低收盘价。 Hmm, not what I was looking for.嗯,不是我要找的。

I tried:我试过:

data_DF5=data_DF['time'].resample('5Min')
data_DF5['volume']=data_DF['volume'].resample('5Min').sum()
data_DF5['open']=data_DF['open'].resample('5Min').first()
data_DF5['close']=data_DF['close'].resample('5Min').last()
data_DF5['high']=data_DF['high'].resample('5Min').max()
data_DF5['low']=data_DF['low'].resample('5Min').min()

With the intent of building the dataframe one column at a time.旨在一次构建一列数据框。

And I get a我得到一个

"Unable to open 'hashtable_class_helper.pxi': File not found " error which I cannot understand. “无法打开‘hashtable_class_helper.pxi’:找不到文件”错误,我无法理解。 If I change the first line with如果我改变第一行

data_DF5=data_DF['vwap'].resample('5Min').mean()

I get a dataframe which I cannot even interpret [see (*)].我得到了一个我什至无法解释的数据框 [参见 (*)]。

And if I change the first line with如果我改变第一行

data_DF5=data_DF['vwap'].resample('5Min')

I get:我得到:

'DatetimeIndexResampler' object does not support item assignment. 'DatetimeIndexResampler' 对象不支持项目分配。

I am really at a loss.我真的很茫然。 I have looked for stackoverflow other questions, but none seem to cover this case.我一直在寻找 stackoverflow 的其他问题,但似乎没有一个涵盖这种情况。 Also the manual page does not seem to be clear on how to solve this. 手册页似乎也不清楚如何解决这个问题。

(*) (*)

2018-12-29 07:05:00 3417.8 2018-12-29 07:10:00 3411.12 2018-12-29 07:15:00 3408.98 2018-12-29 07:20:00 3409.46 2018-12-29 07:25:00 3409.26 2018-12-29 07:30:00 2729.18 2018-12-29 07:35:00 3413.9 2018-12-29 07:40:00 2739.32 2018-12-29 07:45:00 3426.12 2018-12-29 07:50:00 3423.46 2018-12-29 07:55:00 3433.22 2018-12-29 08:00:00 3424.14 2018-12-29 08:05:00 3426.44 2018-12-29 08:10:00 3424.6 2018-12-29 08:15:00 3425.22 2018-12-29 08:20:00 3425.6 2018-12-29 08:25:00 3425.72 2018-12-29 08:30:00 3427.96 2018-12-29 08:35:00 3427.64 2018-12-29 08:40:00 3427.06 2018-12-29 08:45:00 3426.06 2018-12-29 08:50:00 3423.38 2018-12-29 08:55:00 3426.42 2018-12-29 09:00:00 3441.08 2018-12-29 09:05:00 3439.68 2018-12-29 09:10:00 3429.38 2018-12-29 09:15:00 3422.12 2018-12-29 09:20:00 3418.4 2018-12-29 09:25:00 3419 2018-12-29 09:30:00 2018-12-29 07:05:00 3417.8 2018-12-29 07:10:00 3411.12 2018-12-29 07:15:00 3408.98 2018-12-29-09-018.98-2018-12-209.0707 :25:00 3409.26 2018-12-29 07:30:00 2729.18 2018-12-29 07:35:00 3413.9 2018-12-29 07:40:00 2729.18-12-29 07:40:00 273492:07:35:00 -12-29 07:50:00 3423.46 2018-12-29 07:55:00 3433.22 2018-12-29 08:00:00 3424.14 2018-12-29 08:34208:04-08:0420 10:00 3424.6 2018-12-29 08:15:00 3425.22 2018-12-29 08:20:00 3425.6 2018-12-29 08:25:00 3425-12-29 08:25:00 3425-12-29 08:20:00 3425-72 08:00 12-29 08:35:00 3427.64 2018-12-29 08:40:00 3427.06 2018-12-29 08:45:00 3426.06 2018-12-29 08:3820 08:3820:50 3427.06 :00 3426.42 2018-12-29 09:00:00 3441.08 2018-12-29 09:05:00 3439.68 2018-12-29 09:10:00 3429-12-20 3429-12-29 09:10:00 3429-12-29 3439.68 -29 09:20:00 3418.4 2018-12-29 09:25:00 3419 2018-12-29 09:30:00
3415.94 ... 2018-12-29 17:05:00 3363.46 2018-12-29 17:10:00 3364.86 2018-12-29 17:15:00 3362.56 2018-12-29 17:20:00 3360.88 2018-12-29 17:25:00 3358.98 2018-12-29 17:30:00 3353.8 2018-12-29 17:35:00 3371.62 2018-12-29 17:40:00 3365.38 2018-12-29 17:45:00 3368.76 2018-12-29 17:50:00 3373.82 2018-12-29 17:55:00 3373.32 2018-12-29 18:00:00 3374.78 2018-12-29 18:05:00 3372.56 2018-12-29 18:10:00 3370.3 2018-12-29 18:15:00 3370.3 2018-12-29 18:20:00 3371.36 2018-12-29 18:25:00 3372.14 2018-12-29 18:30:00 3367.36 2018-12-29 18:35:00 3371.3 2018-12-29 18:40:00 3367.08 2018-12-29 18:45:00 3363.3 2018-12-29 18:50:00 3357.66 2018-12-29 18:55:00 3357.64 2018-12-29 19:00:00 3357.64 2018-12-29 19:05:00 3356 volume time 2018-12-29 07:05:00 0.112311 2018-12-... open time 2018-12-29 07:05:00 3418.9 2018-12-29 ... close time 2018-12-29 07:05:00 3415.94 ... 2018-12-29 17:05:00 3363.46 2018-12-29 17:10:00 3364.86 2018-12-29 17:15:00 3362.56-0808-2820 2018-12-29 17:10:00 3364.86 12-29 17:25:00 3358.98 2018-12-29 17:30:00 3353.8 2018-12-29 17:35:00 3371.62 2018-12-29 17:32250-17:33620.8-18:40:00 3353.8 :00 3368.76 2018-12-29 17:50:00 3373.82 2018-12-29 17:55:00 3373.32 2018-12-29 18:00:00 3374.82-29 18:00:00 3374.82-29 18:00 3374.82-29 18:00 -29 18:10:00 3370.3 2018-12-29 18:15:00 3370.3 2018-12-29 18:20:00 3371.36 2018-12-29 18:25:00.14137:25:00.14132 00 3367.36 2018-12-29 18:35:00 3371.3 2018-12-29 18:40:00 3367.08 2018-12-29 18:45:00 3363.3-12-29 18:40:00 3363.3-08-18-2018-18-18-2018 29 18:55:00 3357.64 2018-12-29 19:00:00 3357.64 2018-12-29 19:05:00 3356 卷时间 2018-12-29 07:05:02-12-1356 开..时间 2018-12-29 07:05:00 3418.9 2018-12-29 ... 关闭时间 2018-12-29 07:05:00
3416.8 2018-12-29 ... high time 2018-12-29 07:05:00 3418.9 2018-12-29 ... low time 2018-12-29 07:05:00 3416.8 2018-12-29 ... Name: vwap, Length: 150, dtype: object 3416.8 2018-12-29 ... 高时 2018-12-29 07:05:00 3418.9 2018-12-29 ... 低时 2018-12-29 07:05:00 3416.8-2018-.12 . 名称:vwap,长度:150,dtype:对象

I think you need pd.Grouper我想你需要pd.Grouper

data_DF = data_DF.groupby(pd.Grouper(freq='5min')).agg({'open':'first',
                                                        'close':'last',
                                                        'high':'max',
                                                        'low':'min'})

                       open   close    high     low
time                                               
2018-12-29 07:30:00  3411.4  3413.9  3413.9  3411.4
2018-12-29 07:35:00  3413.9  3413.1  3416.1  3411.9
2018-12-29 07:40:00  3413.1  3422.9  3427.5  3413.1
2018-12-29 07:45:00  3421.1  3423.8  3431.7  3418.0
2018-12-29 07:50:00  3423.8  3428.2  3428.2  3418.9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用不同的聚合函数重新采样数据帧 - Resample a dataframe with different aggregate functions Pandas DataFrame resample() 和 aggregate() 与 MultiIndex 列 - Pandas DataFrame resample() and aggregate() with MultiIndex columns Pandas:重新采样 dataframe 以匹配不同 dataframe 的 DatetimeIndex - Pandas: resample a dataframe to match a DatetimeIndex of a different dataframe pandas dataframe resample聚合函数使用多个具有自定义函数的列? - pandas dataframe resample aggregate function use multiple columns with a customized function? 每季度对数据框重新采样一次,但使用不同的结束月份 - Resample dataframe quarterly but using different end months 在 Pandas 中的不同列上使用 lambda dataframe - Using lambda if condition on different columns in Pandas dataframe 将不同聚合函数应用于 pandas dataframe 的不同列的 Pythonic 方式? 并有效地命名列? - Pythonic way to apply different aggregate functions to different columns of a pandas dataframe? And to name the columns efficiently? 为什么 pandas groupby+resample 在附加的 dataframe 上有不同的结果 - Why are there different results for pandas groupby+resample on an appended dataframe 具有非数值数据的pandas数据框列在重新采样时被删除 - pandas dataframe columns with non-numeric data get deleted on resample 对 Pandas Dataframe 中不同列的多索引 - Multiindex to different columns in Pandas Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM