简体   繁体   中英

R: Using a datatable to aggregate data

I'm new to using data tables and would like some help aggregating some data.

Login   OpenTime            CloseTime     OpenedValueUSD    ClosedValueUSD  Year    Month   TransferredValue Identifier
859    04/02/2014 07:55 05/02/2014 15:37    10000               10000       2014    2             0                1
859    07/02/2014 03:16 07/02/2014 03:51    8960.755            8960.755    2014    2             0                2
859    11/02/2014 12:41 13/02/2014 11:56    13635.178           13606.901   2014    2             0                3
859    11/02/2014 13:34 11/02/2014 15:34    13635.178           13635.178   2014    2             13635.178        4
859    12/02/2014 13:46 14/02/2014 09:59    13660.246           13649.278   2014    2             13635.178        5
859    13/02/2014 15:33 13/02/2014 15:42    13606.901           13606.901   2014    2             13660.246        6
859    25/03/2014 14:52 26/03/2014 12:58    10000               10000       2014    3             0                7

For each row, I would like to aggregate all trades that opened prior to that trade and close after that trade is opened. For example, the trade in the third row opened prior to the trade in the fourth, but only closed after the fourth trade opened. So I then take the OpenedValueUSD for that trade (and any other appropriate trades (none, in this case)) and place it in the TransferredValue column.

Here is current code:

tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[OpenTime < 
           tradeData$OpenTime & CloseTime > tradeData$OpenTime & Login == 
           tradeData$Login]), by="Identifier"]

Here's another way using foverlaps() which doesn't require row-wise grouping. I'll call your data.table dt .

  1. Convert OpenTime and CloseTime to POSIXct format, as shown by @alex23lemm.

  2. Add a temporary column tmpTime which is equal to OpenTime . We will use this in foverlaps() .

     dt[, tmpTime := OpenTime] 
  3. setkey() on Login, OpenTime, CloseTime colums.

     setkey(dt, Login, OpenTime, CloseTime) 
  4. Using foverlaps() , we will now get which intervals in Login, OpenTime, tmpTime fall entirely within Login, OpenTime, CloseTime .

     olaps = foverlaps(dt, dt, by.x=c("Login", "OpenTime", "tmpTime"), which=TRUE, nomatch=0L, type="within") 

    by.y is automatically taken to be the key columns.

  5. Remove self-overlaps, ie, remove those where xid == yid .

     olaps = olaps[xid != yid] # xid yid # 1: 4 3 # 2: 5 3 # 3: 6 5 
  6. Assign to xid rows the values corresponding to yid . And remove tmpTime .

     dt[olaps$xid, TransferredValue := dt$OpenedValueUSD[olaps$yid]][, tmpTime := NULL] # Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier # 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2 0.00 1 # 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2 0.00 2 # 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2 0.00 3 # 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2 13635.18 4 # 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2 13635.18 5 # 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2 13660.25 6 # 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3 0.00 7 

This should produce the expected result:

tradeData[,OpenTime:=as.POSIXct(OpenTime,format="%d/%m/%Y %H:%M")]
tradeData[,CloseTime:=as.POSIXct(CloseTime,format="%d/%m/%Y %H:%M")]

tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[tradeData$OpenTime < OpenTime & 
                                                            tradeData$CloseTime > OpenTime]), by = 'Identifier']
tradeData


    # Login            OpenTime           CloseTime OpenedValueUSD ClosedValueUSD Year Month
# 1:   859 2014-02-04 07:55:00 2014-02-05 15:37:00      10000.000      10000.000 2014     2
# 2:   859 2014-02-07 03:16:00 2014-02-07 03:51:00       8960.755       8960.755 2014     2
# 3:   859 2014-02-11 12:41:00 2014-02-13 11:56:00      13635.178      13606.901 2014     2
# 4:   859 2014-02-11 13:34:00 2014-02-11 15:34:00      13635.178      13635.178 2014     2
# 5:   859 2014-02-12 13:46:00 2014-02-14 09:59:00      13660.246      13649.278 2014     2
# 6:   859 2014-02-13 15:33:00 2014-02-13 15:42:00      13606.901      13606.901 2014     2
# 7:   859 2014-03-25 14:52:00 2014-03-26 12:58:00      10000.000      10000.000 2014     3
# Identifier TransferredValue
# 1:          1             0.00
# 2:          2             0.00
# 3:          3             0.00
# 4:          4         13635.18
# 5:          5         13635.18
# 6:          6         13660.25
# 7:          7             0.00

Data:

tradeData <- data.table(Login = c(859, 859, 859, 859, 859, 859, 859),
                        OpenTime = c("04/02/2014 07:55", "07/02/2014 03:16", "11/02/2014 12:41", "11/02/2014 13:34", "12/02/2014 13:46",
                                     "13/02/2014 15:33", "25/03/2014 14:52"),
                        CloseTime = c("05/02/2014 15:37", "07/02/2014 03:51", "13/02/2014 11:56", "11/02/2014 15:34", "14/02/2014 09:59",
                                      "13/02/2014 15:42", "26/03/2014 12:58"),
                        OpenedValueUSD = c(10000.000, 8960.755, 13635.178, 13635.178, 13660.246, 13606.901, 10000.000),
                        ClosedValueUSD = c(10000.000,  8960.755, 13606.901, 13635.178, 13649.278, 13606.901, 10000.000),
                        Year = c(2014, 2014, 2014, 2014, 2014, 2014, 2014),
                        Month = c(2, 2, 2, 2, 2, 2, 3),
                        Identifier = c(1, 2, 3, 4, 5, 6, 7))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM