简体   繁体   English

计算pandas中15分钟数据的每日差异

[英]Calculating daily difference for 15 minutes data in pandas

I have a huge dataframe of open and close prices recorded every 15 minutes of the day.我有一个巨大的开盘价和收盘价数据框,每天每 15 分钟记录一次。 The day starts at 9:45 and ends at 16:15.一天从 9:45 开始,到 16:15 结束。 My current df looks like this:我目前的 df 看起来像这样:

                      open_p  close_p
date                                
2013-12-20 09:45:00   -1.14    -1.12
2013-12-20 10:00:00   -1.12    -1.12
2013-12-20 10:15:00   -1.12    -1.11
2013-12-20 10:30:00   -1.11    -1.10
...
2013-12-20 15:30:00   -1.13    -1.14
2013-12-20 15:45:00   -1.14    -1.14
2013-12-20 16:00:00   -1.13    -1.06
2013-12-20 16:15:00   -1.05    -1.01
2013-12-23 09:45:00   -1.02    -1.02
2013-12-23 10:00:00   -1.02    -1.02
2013-12-23 10:15:00   -1.03    -1.07
2013-12-23 10:30:00   -1.06    -1.08
....
2013-12-23 15:30:00   -1.11    -1.14
2013-12-23 15:45:00   -1.13    -1.12
2013-12-23 16:00:00   -1.12    -1.09
2013-12-23 16:15:00   -1.09    -1.13
...

I would like to calculate difference between close_p at 16:15 and open_p at 9:45 for each day.我想计算每天 16:15 的 close_p 和 9:45 的 open_p 之间的差异。 For example daily change column for 2013-12-20 equals -1.01 - (-1.14).例如,2013-12-20 的每日变化列等于 -1.01 - (-1.14)。 The results should look like this:结果应如下所示:

                      open_p  close_p  daily_change
date                                
2013-12-20 09:45:00   -1.14    -1.12     0.13
2013-12-20 10:00:00   -1.12    -1.12     0.13
2013-12-20 10:15:00   -1.12    -1.11     0.13
2013-12-20 10:30:00   -1.11    -1.10     0.13
...
2013-12-20 15:30:00   -1.13    -1.14     0.13
2013-12-20 15:45:00   -1.14    -1.14     0.13
2013-12-20 16:00:00   -1.13    -1.06     0.13
2013-12-20 16:15:00   -1.05    -1.01     0.13
2013-12-23 09:45:00   -1.02    -1.02    -0,11
2013-12-23 10:00:00   -1.02    -1.02    -0,11
2013-12-23 10:15:00   -1.03    -1.07    -0,11
2013-12-23 10:30:00   -1.06    -1.08    -0,11
....
2013-12-23 15:30:00   -1.11    -1.14    -0,11
2013-12-23 15:45:00   -1.13    -1.12    -0,11
2013-12-23 16:00:00   -1.12    -1.09    -0,11
2013-12-23 16:15:00   -1.09    -1.13    -0,11

What's the fastest and most convenient way of getting this done?完成这项工作的最快和最方便的方法是什么?

You can groupby on date, agg on first and last, then find the difference:您可以groupby上日期, agg上的第一和最后一个,然后找到差异:

print (df.groupby(pd.Grouper(freq="D"))
         .agg({"open_p":"first", "close_p":"last"})
         .diff(axis=1)["close_p"])

date
2013-12-20    0.13
2013-12-21     NaN
2013-12-22     NaN
2013-12-23   -0.11
Freq: D, Name: close_p, dtype: float64

Use GroupBy.transform with GroupBy.last and GroupBy.first values and subtract to new column:GroupBy.transformGroupBy.lastGroupBy.first值一起使用并减去到新列:

g = df.groupby(pd.Grouper(freq='d'))
df['daily_change'] = g['close_p'].transform('last').sub(g['open_p'].transform('first'))
print (df)
                     open_p  close_p  daily_change
date                                              
2013-12-20 09:45:00   -1.14    -1.12          0.13
2013-12-20 10:00:00   -1.12    -1.12          0.13
2013-12-20 10:15:00   -1.12    -1.11          0.13
2013-12-20 10:30:00   -1.11    -1.10          0.13
2013-12-20 15:30:00   -1.13    -1.14          0.13
2013-12-20 15:45:00   -1.14    -1.14          0.13
2013-12-20 16:00:00   -1.13    -1.06          0.13
2013-12-20 16:15:00   -1.05    -1.01          0.13
2013-12-23 09:45:00   -1.02    -1.02         -0.11
2013-12-23 10:00:00   -1.02    -1.02         -0.11
2013-12-23 10:15:00   -1.03    -1.07         -0.11
2013-12-23 10:30:00   -1.06    -1.08         -0.11
2013-12-23 15:30:00   -1.11    -1.14         -0.11
2013-12-23 15:45:00   -1.13    -1.12         -0.11
2013-12-23 16:00:00   -1.12    -1.09         -0.11
2013-12-23 16:15:00   -1.09    -1.13         -0.11

Another idea is use Series.at_time , remove times converting DatetimeIndex to dates and last Series.map :另一个想法是使用Series.at_time ,删除将 DatetimeIndex 转换为dates和最后一个Series.map

f = lambda x: x.date()
s = (df['close_p'].at_time('16:15:00').rename(f)
       .sub(df.at_time('09:45:00').rename(f)['open_p']))

df['daily_change'] = df.index.to_frame()['date'].dt.date.map(s)

print (df)
                     open_p  close_p  daily_change
date                                              
2013-12-20 09:45:00   -1.14    -1.12          0.13
2013-12-20 10:00:00   -1.12    -1.12          0.13
2013-12-20 10:15:00   -1.12    -1.11          0.13
2013-12-20 10:30:00   -1.11    -1.10          0.13
2013-12-20 15:30:00   -1.13    -1.14          0.13
2013-12-20 15:45:00   -1.14    -1.14          0.13
2013-12-20 16:00:00   -1.13    -1.06          0.13
2013-12-20 16:15:00   -1.05    -1.01          0.13
2013-12-23 09:45:00   -1.02    -1.02         -0.11
2013-12-23 10:00:00   -1.02    -1.02         -0.11
2013-12-23 10:15:00   -1.03    -1.07         -0.11
2013-12-23 10:30:00   -1.06    -1.08         -0.11
2013-12-23 15:30:00   -1.11    -1.14         -0.11
2013-12-23 15:45:00   -1.13    -1.12         -0.11
2013-12-23 16:00:00   -1.12    -1.09         -0.11
2013-12-23 16:15:00   -1.09    -1.13         -0.11

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM