[英]R: Using a datatable to aggregate data
I'm new to using data tables and would like some help aggregating some data. 我是新手使用数据表,想要一些帮助聚合一些数据。
Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier
859 04/02/2014 07:55 05/02/2014 15:37 10000 10000 2014 2 0 1
859 07/02/2014 03:16 07/02/2014 03:51 8960.755 8960.755 2014 2 0 2
859 11/02/2014 12:41 13/02/2014 11:56 13635.178 13606.901 2014 2 0 3
859 11/02/2014 13:34 11/02/2014 15:34 13635.178 13635.178 2014 2 13635.178 4
859 12/02/2014 13:46 14/02/2014 09:59 13660.246 13649.278 2014 2 13635.178 5
859 13/02/2014 15:33 13/02/2014 15:42 13606.901 13606.901 2014 2 13660.246 6
859 25/03/2014 14:52 26/03/2014 12:58 10000 10000 2014 3 0 7
For each row, I would like to aggregate all trades that opened prior to that trade and close after that trade is opened. 对于每一行,我想汇总在该交易之前开立的所有交易,并在该交易开启后关闭。 For example, the trade in the third row opened prior to the trade in the fourth, but only closed after the fourth trade opened. 例如,第三行的交易在第四次交易之前开盘,但仅在第四次交易开盘后关闭。 So I then take the OpenedValueUSD for that trade (and any other appropriate trades (none, in this case)) and place it in the TransferredValue column. 因此,我接着使用OpenedValueUSD进行交易(以及任何其他适当的交易(在本例中为无))并将其放在TransferredValue列中。
Here is current code: 这是当前代码:
tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[OpenTime <
tradeData$OpenTime & CloseTime > tradeData$OpenTime & Login ==
tradeData$Login]), by="Identifier"]
Here's another way using foverlaps()
which doesn't require row-wise grouping. 这是使用foverlaps()
的另一种方式,它不需要按行分组。 I'll call your data.table dt
. 我会打电话给你的data.table dt
。
Convert OpenTime
and CloseTime
to POSIXct format, as shown by @alex23lemm. 将OpenTime
和CloseTime
转换为POSIXct格式,如@ alex23lemm所示。
Add a temporary column tmpTime
which is equal to OpenTime
. 添加一个临时列tmpTime
,它等于OpenTime
。 We will use this in foverlaps()
. 我们将在foverlaps()
使用它。
dt[, tmpTime := OpenTime]
setkey()
on Login, OpenTime, CloseTime
colums. Login, OpenTime, CloseTime
上的setkey()
。
setkey(dt, Login, OpenTime, CloseTime)
Using foverlaps()
, we will now get which intervals in Login, OpenTime, tmpTime
fall entirely within Login, OpenTime, CloseTime
. 使用foverlaps()
,我们现在将在Login, OpenTime, tmpTime
哪些时间间隔完全落在 Login, OpenTime, CloseTime
。
olaps = foverlaps(dt, dt, by.x=c("Login", "OpenTime", "tmpTime"), which=TRUE, nomatch=0L, type="within")
by.y
is automatically taken to be the key columns. by.y
自动被视为关键列。
Remove self-overlaps, ie, remove those where xid == yid
. 删除自重叠,即删除xid == yid
那些。
olaps = olaps[xid != yid] # xid yid # 1: 4 3 # 2: 5 3 # 3: 6 5
Assign to xid
rows the values corresponding to yid
. 将对应于yid
的值分配给xid
行。 And remove tmpTime
. 并删除tmpTime
。
dt[olaps$xid, TransferredValue := dt$OpenedValueUSD[olaps$yid]][, tmpTime := NULL] # Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier # 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2 0.00 1 # 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2 0.00 2 # 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2 0.00 3 # 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2 13635.18 4 # 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2 13635.18 5 # 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2 13660.25 6 # 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3 0.00 7
This should produce the expected result: 这应该产生预期的结果:
tradeData[,OpenTime:=as.POSIXct(OpenTime,format="%d/%m/%Y %H:%M")]
tradeData[,CloseTime:=as.POSIXct(CloseTime,format="%d/%m/%Y %H:%M")]
tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[tradeData$OpenTime < OpenTime &
tradeData$CloseTime > OpenTime]), by = 'Identifier']
tradeData
# Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month
# 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2
# 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2
# 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2
# 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2
# 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2
# 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2
# 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3
# Identifier TransferredValue
# 1: 1 0.00
# 2: 2 0.00
# 3: 3 0.00
# 4: 4 13635.18
# 5: 5 13635.18
# 6: 6 13660.25
# 7: 7 0.00
Data: 数据:
tradeData <- data.table(Login = c(859, 859, 859, 859, 859, 859, 859),
OpenTime = c("04/02/2014 07:55", "07/02/2014 03:16", "11/02/2014 12:41", "11/02/2014 13:34", "12/02/2014 13:46",
"13/02/2014 15:33", "25/03/2014 14:52"),
CloseTime = c("05/02/2014 15:37", "07/02/2014 03:51", "13/02/2014 11:56", "11/02/2014 15:34", "14/02/2014 09:59",
"13/02/2014 15:42", "26/03/2014 12:58"),
OpenedValueUSD = c(10000.000, 8960.755, 13635.178, 13635.178, 13660.246, 13606.901, 10000.000),
ClosedValueUSD = c(10000.000, 8960.755, 13606.901, 13635.178, 13649.278, 13606.901, 10000.000),
Year = c(2014, 2014, 2014, 2014, 2014, 2014, 2014),
Month = c(2, 2, 2, 2, 2, 2, 3),
Identifier = c(1, 2, 3, 4, 5, 6, 7))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.