繁体   English   中英

R-在data.table上滚动窗口

[英]R - rolling window over data.table

我有以下data.table:

          time       id type   price      size  api start.point  end.point
 1: 1399672906 37119594  ASK 440.002 1.4840000 TRUE  1399672606 1399672906
 2: 1399672940 37119597  BID 441.000 0.1758830 TRUE  1399672640 1399672940
 3: 1399672940 37119598  BID 441.000 0.0491166 TRUE  1399672640 1399672940
 4: 1399673105 37119638  ASK 440.002 0.1313700 TRUE  1399672805 1399673105
 5: 1399673198 37119668  BID 441.000 0.0233013 TRUE  1399672898 1399673198
 6: 1399673198 37119669  BID 441.000 0.9744230 TRUE  1399672898 1399673198
 7: 1399673208 37119675  BID 441.000 0.1587060 TRUE  1399672908 1399673208
 8: 1399673208 37119676  BID 441.000 0.1238870 TRUE  1399672908 1399673208
 9: 1399673208 37119677  BID 441.001 0.0100000 TRUE  1399672908 1399673208
10: 1399673208 37119678  BID 441.175 0.0129740 TRUE  1399672908 1399673208
11: 1399673208 37119679  BID 441.192 0.0100000 TRUE  1399672908 1399673208
12: 1399673208 37119680  BID 441.399 0.0129740 TRUE  1399672908 1399673208
13: 1399673208 37119681  BID 441.499 1.7500000 TRUE  1399672908 1399673208
14: 1399673208 37119682  BID 441.500 8.0214600 TRUE  1399672908 1399673208
15: 1399673241 37119691  BID 441.500 0.0453001 TRUE  1399672941 1399673241
16: 1399673274 37119696  ASK 440.030 0.9133460 TRUE  1399672974 1399673274
17: 1399673360 37119705  BID 440.030 0.0580000 TRUE  1399673060 1399673360
18: 1399673433 37119709  ASK 440.002 0.0319611 TRUE  1399673133 1399673433
19: 1399673506 37119711  ASK 440.002 0.2618460 TRUE  1399673206 1399673506
20: 1399673507 37119712  BID 440.002 1.0000000 TRUE  1399673207 1399673507

哪里:

  • 时间是unix时间戳
  • id是交易所指定的交易编号
  • 起点=“时间”减去5分钟
  • end.point =实际上等于变量“时间”

意甲不是等距的。 变量start.point和end.point实际上创建了以变量“ time”结尾的5分钟移动窗口。 我想计算特定窗口中的交易频率。

我用for循环完成了它:

for (i in 1:nrow(trades)){

  trades[i, freq := length(unique(trades[time >= start.point[i] & time <= end.point[i]]$id))]

  setTxtProgressBar(status.bar, i)

}

但是,我想知道是否还有更多“时尚”的data.table方式。 我尝试了类似的东西:

trades[, freq := list(length(unique(trades[time >= start.point & time <= end.point,]$id))), by = list(id)]

但是结果错误,似乎无法在“每行”基础上运行:

            time       id type   price       size  api start.point  end.point freq
  1: 1399672906 37119594  ASK 440.002  1.4840000 TRUE  1399672606 1399672906  100
  2: 1399672940 37119597  BID 441.000  0.1758830 TRUE  1399672640 1399672940  100
  3: 1399672940 37119598  BID 441.000  0.0491166 TRUE  1399672640 1399672940  100
  4: 1399673105 37119638  ASK 440.002  0.1313700 TRUE  1399672805 1399673105  100
  5: 1399673198 37119668  BID 441.000  0.0233013 TRUE  1399672898 1399673198  100
  6: 1399673198 37119669  BID 441.000  0.9744230 TRUE  1399672898 1399673198  100
  7: 1399673208 37119675  BID 441.000  0.1587060 TRUE  1399672908 1399673208  100
  8: 1399673208 37119676  BID 441.000  0.1238870 TRUE  1399672908 1399673208  100
  9: 1399673208 37119677  BID 441.001  0.0100000 TRUE  1399672908 1399673208  100
 10: 1399673208 37119678  BID 441.175  0.0129740 TRUE  1399672908 1399673208  100
 11: 1399673208 37119679  BID 441.192  0.0100000 TRUE  1399672908 1399673208  100

更新:

参见下面的结构:

structure(list(time = c(1399672906L, 1399673105L, 1399673274L, 
1399673433L, 1399673506L, 1399673531L), id = c(37119594L, 37119638L, 
37119696L, 37119709L, 37119711L, 37119717L), type = c("ASK", 
"ASK", "ASK", "ASK", "ASK", "ASK"), price = c(440.002, 440.002, 
440.03, 440.002, 440.002, 440), size = c(1.484, 0.13137, 0.913346, 
0.0319611, 0.261846, 3.168), api = c(TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE), start.point = c(1399672606, 1399672805, 1399672974, 
1399673133, 1399673206, 1399673231), end.point = c(1399672906L, 
1399673105L, 1399673274L, 1399673433L, 1399673506L, 1399673531L
), freq = c(1L, 4L, 13L, 14L, 13L, 11L)), .Names = c("time", 
"id", "type", "price", "size", "api", "start.point", "end.point", 
"freq"), sorted = c("type", "time"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000002e50788>)

我认为,现在可以使用bioconductor包IRanges包来最好地完成此操作,直到在data.table中实现间隔连接/范围连接data.table

require(IRanges)
ir1 = IRanges(trades$time, width=1L)
ir2 = IRanges(trades$start.point, trades$end.point)

olaps = findOverlaps(ir1, ir2, type = "within")
dt = data.table(queryHits(olaps), subjectHits(olaps))[, .N, by=V2]

trades[dt$V2, freq := dt$N]

#          time       id type   price      size  api start.point  end.point freq
# 1: 1399672906 37119594  ASK 440.002 1.4840000 TRUE  1399672606 1399672906    1
# 2: 1399673105 37119638  ASK 440.002 0.1313700 TRUE  1399672805 1399673105    2
# 3: 1399673274 37119696  ASK 440.030 0.9133460 TRUE  1399672974 1399673274    2
# 4: 1399673433 37119709  ASK 440.002 0.0319611 TRUE  1399673133 1399673433    2
# 5: 1399673506 37119711  ASK 440.002 0.2618460 TRUE  1399673206 1399673506    3
# 6: 1399673531 37119717  ASK 440.000 3.1680000 TRUE  1399673231 1399673531    4

高温超导

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM