[英]Pandas groupby given time interval
我有一个间隔为 5 分钟的 dataframe。 我想计算给定用户指定的内部时间的回报。 例如,我想知道每天下午 2:55 到下午 3:00 的回报。 我如何对其执行分组操作?
我的数据看起来像这样
Open High Low Close Volume VB VS MCVol MCVal OpenInt Ret TotalRet
date
2019-07-12 14:40:00+08:00 0.411629 0.412154 0.411366 0.411891 2412.0 408971474.0 414685176.0 315290.0 3.007625e+10 556102.0 25.0 41975.0
2019-07-12 14:45:00+08:00 0.411891 0.413205 0.411629 0.412942 6536.0 408975390.0 414687662.0 318884.0 3.041836e+10 556200.0 100.0 42075.0
2019-07-12 14:50:00+08:00 0.412942 0.413205 0.411891 0.412680 3288.0 408976658.0 414689656.0 320962.0 3.061613e+10 555638.0 -25.0 42050.0
2019-07-12 14:55:00+08:00 0.412680 0.414254 0.412417 0.413992 5926.0 408980236.0 414691758.0 324190.0 3.092359e+10 555482.0 125.0 42175.0
2019-07-12 15:00:00+08:00 0.413729 0.413992 0.412417 0.412942 8190.0 408983450.0 414696208.0 329278.0 3.140824e+10 553480.0 -100.0 42075.0
提出的大多数问题都是通过重新采样解决的,我不确定这是否适合我的情况。 我的更多是使用between_time()
和 groupby。
谢谢,
我的解决方案(假设Close列用于返回计算)。 编辑:由于价格系列每五分钟连续记录一次,因此不确定您所说的 groupby 是什么意思。 不同股票的Groupby开始时间为14.55?
import pandas as pd
ind = pd.date_range('2019-07-12 14:40', periods=5, freq='5min')
stock = {'Close':[0.411891,0.412942,0.412680,0.413992,0.412942]}
df = pd.DataFrame(data=stock,index=ind)
print(df)
output = df.between_time('14:55','15:00')['Close'].pct_change().iloc[1]
print('Return-----')
print(output)
Output
Close
2019-07-12 14:40:00 0.411891
2019-07-12 14:45:00 0.412942
2019-07-12 14:50:00 0.412680
2019-07-12 14:55:00 0.413992
2019-07-12 15:00:00 0.412942
Return-----
-0.0025362808943169
Edit2:我现在知道你的意思了。 尝试这个。
import pandas as pd
ind = pd.date_range('2019-07-12 14:40', periods=5, freq='5min')
ind = ind.append(pd.date_range('2019-07-13 14:40', periods=5, freq='5min'))
stock = {'Close':[0.411891,0.412942,0.412680,0.413992,0.412942,0.423567,0.456321,0.465789,0.431900,0.431672]}
df = pd.DataFrame(data=stock,index=ind)
df.rename_axis('date',inplace=True)
df['date_'] = df.index.date
print(df)
print(df.between_time('14:55','15:00')['Close'])
output = df.between_time('14:55','15:00').groupby('date_').pct_change().iloc[1::2]
print('Return-----')
print(output)
Output
Return-----
Close
date
2019-07-12 15:00:00 -0.002536
2019-07-13 15:00:00 -0.000528
Edit3:这个新代码对于天数重叠和超过 5 分钟的返回非常有效。 我以 15 分钟返回为例(23:55-00:05)。
import pandas as pd
ind = pd.date_range('2019-07-12 23:50', periods=5, freq='5min')
ind = ind.append(pd.date_range('2019-07-13 23:50', periods=5, freq='5min'))
stock = {'Close':[0.411891,0.412942,0.412680,0.413992,0.412942,0.423567,0.456321,0.465789,0.431900,0.431672]}
df = pd.DataFrame(data=stock,index=ind)
df.rename_axis('date',inplace=True)
print(df)
# I am setting this manually, you may improve it by writing a function that calculates
# the number of five-minute interval
num =2
ind = df.between_time('23:55','00:05').iloc[::num+1]['Close'].index
old_price = df.between_time('23:55','00:05')['Close'].iloc[::num+1].reset_index(drop=True)
new_price = df.between_time('23:55','00:05')['Close'].iloc[num::num+1].reset_index(drop=True)
#print(old_price)
#print(new_price)
output = pd.DataFrame(new_price/old_price-1)
output.set_index(ind,inplace=True)
print('Return-----')
print(output)
Output
Close
date
2019-07-12 23:50:00 0.411891
2019-07-12 23:55:00 0.412942
2019-07-13 00:00:00 0.412680
2019-07-13 00:05:00 0.413992
2019-07-13 00:10:00 0.412942
2019-07-13 23:50:00 0.423567
2019-07-13 23:55:00 0.456321
2019-07-14 00:00:00 0.465789
2019-07-14 00:05:00 0.431900
2019-07-14 00:10:00 0.431672
Return-----
Close
date
2019-07-12 23:55:00 0.002543
2019-07-13 23:55:00 -0.053517
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.