[英]Apply a rolling function to a data.table in R with a custom window size
[英]R - rolling window over data.table
我有以下data.table:
time id type price size api start.point end.point
1: 1399672906 37119594 ASK 440.002 1.4840000 TRUE 1399672606 1399672906
2: 1399672940 37119597 BID 441.000 0.1758830 TRUE 1399672640 1399672940
3: 1399672940 37119598 BID 441.000 0.0491166 TRUE 1399672640 1399672940
4: 1399673105 37119638 ASK 440.002 0.1313700 TRUE 1399672805 1399673105
5: 1399673198 37119668 BID 441.000 0.0233013 TRUE 1399672898 1399673198
6: 1399673198 37119669 BID 441.000 0.9744230 TRUE 1399672898 1399673198
7: 1399673208 37119675 BID 441.000 0.1587060 TRUE 1399672908 1399673208
8: 1399673208 37119676 BID 441.000 0.1238870 TRUE 1399672908 1399673208
9: 1399673208 37119677 BID 441.001 0.0100000 TRUE 1399672908 1399673208
10: 1399673208 37119678 BID 441.175 0.0129740 TRUE 1399672908 1399673208
11: 1399673208 37119679 BID 441.192 0.0100000 TRUE 1399672908 1399673208
12: 1399673208 37119680 BID 441.399 0.0129740 TRUE 1399672908 1399673208
13: 1399673208 37119681 BID 441.499 1.7500000 TRUE 1399672908 1399673208
14: 1399673208 37119682 BID 441.500 8.0214600 TRUE 1399672908 1399673208
15: 1399673241 37119691 BID 441.500 0.0453001 TRUE 1399672941 1399673241
16: 1399673274 37119696 ASK 440.030 0.9133460 TRUE 1399672974 1399673274
17: 1399673360 37119705 BID 440.030 0.0580000 TRUE 1399673060 1399673360
18: 1399673433 37119709 ASK 440.002 0.0319611 TRUE 1399673133 1399673433
19: 1399673506 37119711 ASK 440.002 0.2618460 TRUE 1399673206 1399673506
20: 1399673507 37119712 BID 440.002 1.0000000 TRUE 1399673207 1399673507
哪里:
意甲不是等距的。 变量start.point和end.point实际上创建了以变量“ time”结尾的5分钟移动窗口。 我想计算特定窗口中的交易频率。
我用for循环完成了它:
for (i in 1:nrow(trades)){
trades[i, freq := length(unique(trades[time >= start.point[i] & time <= end.point[i]]$id))]
setTxtProgressBar(status.bar, i)
}
但是,我想知道是否还有更多“时尚”的data.table方式。 我尝试了类似的东西:
trades[, freq := list(length(unique(trades[time >= start.point & time <= end.point,]$id))), by = list(id)]
但是结果错误,似乎无法在“每行”基础上运行:
time id type price size api start.point end.point freq
1: 1399672906 37119594 ASK 440.002 1.4840000 TRUE 1399672606 1399672906 100
2: 1399672940 37119597 BID 441.000 0.1758830 TRUE 1399672640 1399672940 100
3: 1399672940 37119598 BID 441.000 0.0491166 TRUE 1399672640 1399672940 100
4: 1399673105 37119638 ASK 440.002 0.1313700 TRUE 1399672805 1399673105 100
5: 1399673198 37119668 BID 441.000 0.0233013 TRUE 1399672898 1399673198 100
6: 1399673198 37119669 BID 441.000 0.9744230 TRUE 1399672898 1399673198 100
7: 1399673208 37119675 BID 441.000 0.1587060 TRUE 1399672908 1399673208 100
8: 1399673208 37119676 BID 441.000 0.1238870 TRUE 1399672908 1399673208 100
9: 1399673208 37119677 BID 441.001 0.0100000 TRUE 1399672908 1399673208 100
10: 1399673208 37119678 BID 441.175 0.0129740 TRUE 1399672908 1399673208 100
11: 1399673208 37119679 BID 441.192 0.0100000 TRUE 1399672908 1399673208 100
更新:
参见下面的结构:
structure(list(time = c(1399672906L, 1399673105L, 1399673274L,
1399673433L, 1399673506L, 1399673531L), id = c(37119594L, 37119638L,
37119696L, 37119709L, 37119711L, 37119717L), type = c("ASK",
"ASK", "ASK", "ASK", "ASK", "ASK"), price = c(440.002, 440.002,
440.03, 440.002, 440.002, 440), size = c(1.484, 0.13137, 0.913346,
0.0319611, 0.261846, 3.168), api = c(TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE), start.point = c(1399672606, 1399672805, 1399672974,
1399673133, 1399673206, 1399673231), end.point = c(1399672906L,
1399673105L, 1399673274L, 1399673433L, 1399673506L, 1399673531L
), freq = c(1L, 4L, 13L, 14L, 13L, 11L)), .Names = c("time",
"id", "type", "price", "size", "api", "start.point", "end.point",
"freq"), sorted = c("type", "time"), class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000002e50788>)
我认为,现在可以使用bioconductor包IRanges
包来最好地完成此操作,直到在data.table
中实现间隔连接/范围连接data.table
。
require(IRanges)
ir1 = IRanges(trades$time, width=1L)
ir2 = IRanges(trades$start.point, trades$end.point)
olaps = findOverlaps(ir1, ir2, type = "within")
dt = data.table(queryHits(olaps), subjectHits(olaps))[, .N, by=V2]
trades[dt$V2, freq := dt$N]
# time id type price size api start.point end.point freq
# 1: 1399672906 37119594 ASK 440.002 1.4840000 TRUE 1399672606 1399672906 1
# 2: 1399673105 37119638 ASK 440.002 0.1313700 TRUE 1399672805 1399673105 2
# 3: 1399673274 37119696 ASK 440.030 0.9133460 TRUE 1399672974 1399673274 2
# 4: 1399673433 37119709 ASK 440.002 0.0319611 TRUE 1399673133 1399673433 2
# 5: 1399673506 37119711 ASK 440.002 0.2618460 TRUE 1399673206 1399673506 3
# 6: 1399673531 37119717 ASK 440.000 3.1680000 TRUE 1399673231 1399673531 4
高温超导
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.