简体   繁体   中英

Is there any way to compare one given row in a data table with other rows withing a group?

I would like to compare one given row with every single row in another column within a specific group (for this specific case second) in data table. For instance suppose I have the following data table

>dt<- data.table(bSIDE = c(0,0,0,0,1,1,1,1,0,0),
            EX = c(1,3,9,14,1,3,5,14,1,2),
            second=c(0,0,0,0,0,0,0,0,1,1),
            PRICE1=c(NA,NA,NA,NA,127.47,127.47,127.47,127.47,NA,NA),
       PRICE2=c(127.49,127.48,127.58,127.46,NA,NA,NA,NA,127.48,127.48))

I would like to compare the first row in column PRICE1 within the group second=0 and EX=1 with every single row in column PRICE2 within second=0, such that if PRICE1: 127.47 is larger at least once than the non NAs price in column 2 (within the group second=0), then it should create a dummy with value of 1, otherwise should take the value of 0. In this case, in no moment this condition is filled, so for EX1 within the second=0, it should create a dummy=0. This procedure should be done for every EX within group second=0.And the same applies when it come to compare PRICE2 with PRICE1, but in this case the condition is reversed, such that if PRICE2 for one given EX within second=0 is lower at least once than any row in PRICE1 within second=0 than it should create a dummy taking the value of 1 and 0 otherwise. Thus, I would like to get the following:

> objective<- data.table(bSIDE = c(0,0,0,0,1,1,1,1,0,0),
            EX = c(1,3,9,14,1,3,5,14,1,2),
            second=c(0,0,0,0,0,0,0,0,1,1),
            PRICE1=c(NA,NA,NA,NA,127.47,127.47,127.47,127.47,NA,NA),
        PRICE2=c(127.49,127.48,127.58,127.46,NA,NA,NA,NA,127.48,127.48), 
            dPRICE1=c(NA, NA, NA, NA, 0, 0, 0, 0, NA, NA), 
            dPRICE2=c(0,0,0,1, NA, NA, NA, NA, NA, NA)
            )

I have a potential solution to this problem but it's very "expensive" in terms of memory. The solution was to create a column for every exchange within the group bSIDE, and than compare it row by row. This solution consumes a lot of memory, which I don't want as the data table may reach even 9 million of observations.

Thank you!

I can't say I really understood your "rules"; your data format is very strange, and I would recommend taking a step back and rethinking the former because this sounds like an XY problem to me. Your data somehow has an awkwardly mixed long and wide data format.

That aside, the following reproduces your expected output. I don't claim that this generalises to your larger problem, but perhaps it will get you started.

dt[, `:=`(
    dPRICE1 = +(first(PRICE2[EX == 1 & !is.na(PRICE2)]) < PRICE1),
    dPRICE2 = +(first(PRICE1[EX == 1 & !is.na(PRICE1)]) > PRICE2)),
    by = second]
#    bSIDE EX second PRICE1 PRICE2 dPRICE1 dPRICE2
# 1:     0  1      0     NA 127.49      NA       0
# 2:     0  3      0     NA 127.48      NA       0
# 3:     0  9      0     NA 127.58      NA       0
# 4:     0 14      0     NA 127.46      NA       1
# 5:     1  1      0 127.47     NA       0      NA
# 6:     1  3      0 127.47     NA       0      NA
# 7:     1  5      0 127.47     NA       0      NA
# 8:     1 14      0 127.47     NA       0      NA
# 9:     0  1      1     NA 127.48      NA      NA
#10:     0  2      1     NA 127.48      NA      NA      

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM