在过去x分钟间隔内滚动时间序列的最大/最小/总和

Question

I have a financial time series data.frame with microsecond precision: 我有一个微秒精度的金融时间序列data.frame：

timestamp                    price  volume
2017-08-29 08:00:00.345678   99.1   10
2017-08-29 08:00:00.674566   98.2   5
....
2017-08-29 16:00:00.111234   97.0   3
2017-08-29 16:00:01.445678   96.5   5

In total: around 100k records per day. 总计：每天约10万条记录。

I saw a couple of functions where I can specify the width of the rolling windows, eg k = 10. But the k is expressed as a number of observations and not minutes. 我看到了几个函数，可以指定滚动窗口的宽度，例如k =10。 但是k表示为多个观察值，而不是分钟。

I need to calculate runing/rolling Max, Min of Price series and a runing/rolling sum of Volume series like that: 我需要计算运行/滚动的最大，价格系列的最小值和交易量/滚动的体积系列之和，如下所示：

starting with a timestamp exactly 5 minutes after the begin of the time series 从时间戳记开始，恰好是时间序列开始后的5分钟
for every following timestamp: look back for 5 minutes interval and 对于以下每个时间戳：回顾5分钟间隔，然后
calculate the rolling statistics. 计算滚动统计。

How to calculate this effectivly? 如何有效计算呢？

Answer 1

Your data 您的资料

I wasn't able to capture milliseconds (but the solution should still work) 我无法捕获毫秒（但该解决方案仍然可以使用）

library(lubridate)
df <- data.frame(timestamp = ymd_hms("2017-08-29 08:00:00.345678", "2017-08-29 08:00:00.674566", "2017-08-29 16:00:00.111234", "2017-08-29 16:00:01.445678"),
                 price=c(99.1, 98.2, 97.0, 96.5),
                 volume=c(10,5,3,5))

purrr and dplyr solution Purrr和Dplyr解决方案

library(purrr)
library(dplyr)
timeinterval <- 5*60   # 5 minute

Filter df for observations within time interval, save as list 过滤df以获取时间间隔内的观测值，另存为列表

mdf <- map(1:nrow(df), ~df[df$timestamp >= df[.x,]$timestamp & df$timestamp < df[.x,]$timestamp+timeinterval,])

Summarise for each data.frame in list 汇总列表中的每个data.frame

statdf <- map_df(mdf, ~.x %>% 
                          summarise(timestamp = head(timestamp,1),
                                    max.price = max(price), 
                                    max.volume = max(volume),
                                    sum.price = sum(price),
                                    sum.volume = sum(volume),
                                    min.price = min(price), 
                                    min.volume = min(volume)))

Output 产量

                timestamp max.price max.volume sum.price sum.volume
1 2017-08-29 08:00:00      99.1         10     197.3         15
2 2017-08-29 08:00:00      98.2          5      98.2          5
3 2017-08-29 16:00:00      97.0          5     193.5          8
4 2017-08-29 16:00:01      96.5          5      96.5          5
  min.price min.volume
1      98.2          5
2      98.2          5
3      96.5          3
4      96.5          5

Answer 2

As I was looking for a backward calculation (start with a timestamp and look 5 minutes back) I slightly modified the great solution by #CPak as follows: 在寻找向后计算时（从时间戳开始，向后看5分钟），我稍微修改了#CPak的出色解决方案，如下所示：

mdf <- map(1:nrow(df), ~df[df$timestamp <= df[.x,]$timestamp & df$timestamp > df[.x,]$timestamp - timeinterval,])

statdf <- map_df(mdf, ~.x %>% 
                      summarise(timestamp_to = tail(timestamp,1),
                                timestamp_from = head(timestamp,1),
                                max.price = max(price), 
                                min.price = min(price),
                                sum.volume = sum(volume),
                                records = n()))

In addition, I added records = n() to see how many records have been used in the intervals. 另外，我添加了records = n（）来查看间隔中使用了多少条记录。

One caveat: the code takes 10 mins on mdf and another 6 mins for statdf on a dataset with 100K+ records. 一个警告：在具有100K +记录的数据集上，代码在mdf上花费10分钟，在statdf上花费6分钟。

Any ideas how to optimize it? 有什么想法如何优化它吗？ Thank you! 谢谢！

在过去x分钟间隔内滚动时间序列的最大/最小/总和

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-08-29 10:24:56

Your data 您的资料

purrr and dplyr solution Purrr和Dplyr解决方案

Output 产量

解决方案2
0 2017-08-29 14:28:53

在过去x分钟间隔内滚动时间序列的最大/最小/总和

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-08-29 10:24:56

Your data 您的资料

purrr and dplyr solution Purrr和Dplyr解决方案

Output 产量

解决方案2 0 2017-08-29 14:28:53

解决方案1
1 已采纳 2017-08-29 10:24:56

解决方案2
0 2017-08-29 14:28:53