简体   繁体   English

R:使用数据表来聚合数据

[英]R: Using a datatable to aggregate data

I'm new to using data tables and would like some help aggregating some data. 我是新手使用数据表,想要一些帮助聚合一些数据。

Login   OpenTime            CloseTime     OpenedValueUSD    ClosedValueUSD  Year    Month   TransferredValue Identifier
859    04/02/2014 07:55 05/02/2014 15:37    10000               10000       2014    2             0                1
859    07/02/2014 03:16 07/02/2014 03:51    8960.755            8960.755    2014    2             0                2
859    11/02/2014 12:41 13/02/2014 11:56    13635.178           13606.901   2014    2             0                3
859    11/02/2014 13:34 11/02/2014 15:34    13635.178           13635.178   2014    2             13635.178        4
859    12/02/2014 13:46 14/02/2014 09:59    13660.246           13649.278   2014    2             13635.178        5
859    13/02/2014 15:33 13/02/2014 15:42    13606.901           13606.901   2014    2             13660.246        6
859    25/03/2014 14:52 26/03/2014 12:58    10000               10000       2014    3             0                7

For each row, I would like to aggregate all trades that opened prior to that trade and close after that trade is opened. 对于每一行,我想汇总在该交易之前开立的所有交易,并在该交易开启后关闭。 For example, the trade in the third row opened prior to the trade in the fourth, but only closed after the fourth trade opened. 例如,第三行的交易在第四次交易之前开盘,但仅在第四次交易开盘后关闭。 So I then take the OpenedValueUSD for that trade (and any other appropriate trades (none, in this case)) and place it in the TransferredValue column. 因此,我接着使用OpenedValueUSD进行交易(以及任何其他适当的交易(在本例中为无))并将其放在TransferredValue列中。

Here is current code: 这是当前代码:

tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[OpenTime < 
           tradeData$OpenTime & CloseTime > tradeData$OpenTime & Login == 
           tradeData$Login]), by="Identifier"]

Here's another way using foverlaps() which doesn't require row-wise grouping. 这是使用foverlaps()的另一种方式,它不需要按行分组。 I'll call your data.table dt . 我会打电话给你的data.table dt

  1. Convert OpenTime and CloseTime to POSIXct format, as shown by @alex23lemm. OpenTimeCloseTime转换为POSIXct格式,如@ alex23lemm所示。

  2. Add a temporary column tmpTime which is equal to OpenTime . 添加一个临时列tmpTime ,它等于OpenTime We will use this in foverlaps() . 我们将在foverlaps()使用它。

     dt[, tmpTime := OpenTime] 
  3. setkey() on Login, OpenTime, CloseTime colums. Login, OpenTime, CloseTime上的setkey()

     setkey(dt, Login, OpenTime, CloseTime) 
  4. Using foverlaps() , we will now get which intervals in Login, OpenTime, tmpTime fall entirely within Login, OpenTime, CloseTime . 使用foverlaps() ,我们现在将在Login, OpenTime, tmpTime哪些时间间隔完全落在 Login, OpenTime, CloseTime

     olaps = foverlaps(dt, dt, by.x=c("Login", "OpenTime", "tmpTime"), which=TRUE, nomatch=0L, type="within") 

    by.y is automatically taken to be the key columns. by.y自动被视为关键列。

  5. Remove self-overlaps, ie, remove those where xid == yid . 删除自重叠,即删除xid == yid那些。

     olaps = olaps[xid != yid] # xid yid # 1: 4 3 # 2: 5 3 # 3: 6 5 
  6. Assign to xid rows the values corresponding to yid . 将对应于yid的值分配给xid行。 And remove tmpTime . 并删除tmpTime

     dt[olaps$xid, TransferredValue := dt$OpenedValueUSD[olaps$yid]][, tmpTime := NULL] # Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier # 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2 0.00 1 # 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2 0.00 2 # 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2 0.00 3 # 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2 13635.18 4 # 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2 13635.18 5 # 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2 13660.25 6 # 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3 0.00 7 

This should produce the expected result: 这应该产生预期的结果:

tradeData[,OpenTime:=as.POSIXct(OpenTime,format="%d/%m/%Y %H:%M")]
tradeData[,CloseTime:=as.POSIXct(CloseTime,format="%d/%m/%Y %H:%M")]

tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[tradeData$OpenTime < OpenTime & 
                                                            tradeData$CloseTime > OpenTime]), by = 'Identifier']
tradeData


    # Login            OpenTime           CloseTime OpenedValueUSD ClosedValueUSD Year Month
# 1:   859 2014-02-04 07:55:00 2014-02-05 15:37:00      10000.000      10000.000 2014     2
# 2:   859 2014-02-07 03:16:00 2014-02-07 03:51:00       8960.755       8960.755 2014     2
# 3:   859 2014-02-11 12:41:00 2014-02-13 11:56:00      13635.178      13606.901 2014     2
# 4:   859 2014-02-11 13:34:00 2014-02-11 15:34:00      13635.178      13635.178 2014     2
# 5:   859 2014-02-12 13:46:00 2014-02-14 09:59:00      13660.246      13649.278 2014     2
# 6:   859 2014-02-13 15:33:00 2014-02-13 15:42:00      13606.901      13606.901 2014     2
# 7:   859 2014-03-25 14:52:00 2014-03-26 12:58:00      10000.000      10000.000 2014     3
# Identifier TransferredValue
# 1:          1             0.00
# 2:          2             0.00
# 3:          3             0.00
# 4:          4         13635.18
# 5:          5         13635.18
# 6:          6         13660.25
# 7:          7             0.00

Data: 数据:

tradeData <- data.table(Login = c(859, 859, 859, 859, 859, 859, 859),
                        OpenTime = c("04/02/2014 07:55", "07/02/2014 03:16", "11/02/2014 12:41", "11/02/2014 13:34", "12/02/2014 13:46",
                                     "13/02/2014 15:33", "25/03/2014 14:52"),
                        CloseTime = c("05/02/2014 15:37", "07/02/2014 03:51", "13/02/2014 11:56", "11/02/2014 15:34", "14/02/2014 09:59",
                                      "13/02/2014 15:42", "26/03/2014 12:58"),
                        OpenedValueUSD = c(10000.000, 8960.755, 13635.178, 13635.178, 13660.246, 13606.901, 10000.000),
                        ClosedValueUSD = c(10000.000,  8960.755, 13606.901, 13635.178, 13649.278, 13606.901, 10000.000),
                        Year = c(2014, 2014, 2014, 2014, 2014, 2014, 2014),
                        Month = c(2, 2, 2, 2, 2, 2, 3),
                        Identifier = c(1, 2, 3, 4, 5, 6, 7))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM