简体   繁体   中英

Rolling Join multiple columns independently to eliminate NAs

I am trying to do a rolling join in data.table that brings in multiple columns, but rolls over both entire missing rows, and individual NA s in particular columns, even when the row is present. By way of example, I have two tables, A , and B :

library(data.table)
A <- data.table(v1 = c(1,1,1,1,1,2,2,2,2,3,3,3,3), 
                v2 = c(6,6,6,4,4,6,4,4,4,6,4,4,4), 
                t = c(10,20,30,60,60,10,40,50,60,20,40,50,60),
                key = c("v1", "v2", "t"))

B <- data.table(v1 = c(1,1,1,1,2,2,2,2,3,3,3,3), 
                v2 = c(4,4,6,6,4,4,6,6,4,4,6,6), 
                t = c(10,70,20,70,10,70,20,70,10,70,20,70), 
                valA = c('a','a',NA,'a',NA,'a','b','a', 'b','b',NA,'b'), 
                valB = c(NA,'q','q','q','p','p',NA,'p',NA,'q',NA,'q'),
                key = c("v1", "v2", "t"))

B
##     v1 v2  t valA valB
##  1:  1  4 10    a   NA
##  2:  1  4 70    a    q
##  3:  1  6 20   NA    q
##  4:  1  6 70    a    q
##  5:  2  4 10   NA    p
##  6:  2  4 70    a    p
##  7:  2  6 20    b   NA
##  8:  2  6 70    a    p
##  9:  3  4 10    b   NA
## 10:  3  4 70    b    q
## 11:  3  6 20   NA   NA
## 12:  3  6 70    b    q

If I do a rolling join (in this case a backwards join), it rolls over all the points when a row cannot be found in B , but still includes points when the row exists but the data to be merged are NA :

B[A, , roll=-Inf]

##     v1 v2  t valA valB
##  1:  1  4 60    a    q
##  2:  1  4 60    a    q
##  3:  1  6 10   NA    q
##  4:  1  6 20   NA    q
##  5:  1  6 30    a    q
##  6:  2  4 40    a    p
##  7:  2  4 50    a    p
##  8:  2  4 60    a    p
##  9:  2  6 10    b   NA
## 10:  3  4 40    b    q
## 11:  3  4 50    b    q
## 12:  3  4 60    b    q
## 13:  3  6 20   NA   NA

I would like to rolling join in such a way that it rolls over these NA s as well. For a single column, I can subset B to remove the NA s, then roll with A :

C <- B[!is.na(valA), .(v1, v2, t, valA)][A, roll=-Inf]

C
##     v1 v2  t valA
##  1:  1  4 60    a
##  2:  1  4 60    a
##  3:  1  6 10    a
##  4:  1  6 20    a
##  5:  1  6 30    a
##  6:  2  4 40    a
##  7:  2  4 50    a
##  8:  2  4 60    a
##  9:  2  6 10    b
## 10:  3  4 40    b
## 11:  3  4 50    b
## 12:  3  4 60    b
## 13:  3  6 20    b

But for multiple columns, I have to do this sequentially, storing the value for each added column and then repeat.

B[!is.na(valB), .(v1, v2, t, valB)][C, roll=-Inf]

##     v1 v2  t valB valA
##  1:  1  4 60    q    a
##  2:  1  4 60    q    a
##  3:  1  6 10    q    a
##  4:  1  6 20    q    a
##  5:  1  6 30    q    a
##  6:  2  4 40    p    a
##  7:  2  4 50    p    a
##  8:  2  4 60    p    a
##  9:  2  6 10    p    b
## 10:  3  4 40    q    b
## 11:  3  4 50    q    b
## 12:  3  4 60    q    b
## 13:  3  6 20    q    b

The end result above is the desired output, but for multiple columns it quickly becomes unwieldy. Is there a better solution?

Joins are about matching up rows. If you want to match rows multiple ways, you'll need multiple joins.

I'd use a loop, but add columns to A (rather than creating new tables C, D, ... following each join):

k     = key(A)
bcols = setdiff(names(B), k)

for (col in bcols) A[, (col) :=
  B[!.(as(NA, typeof(B[[col]]))), on=col][.SD, roll=-Inf, ..col]
][]

A 

    v1 v2  t valA valB
 1:  1  4 60    a    q
 2:  1  4 60    a    q
 3:  1  6 10    a    q
 4:  1  6 20    a    q
 5:  1  6 30    a    q
 6:  2  4 40    a    p
 7:  2  4 50    a    p
 8:  2  4 60    a    p
 9:  2  6 10    b    p
10:  3  4 40    b    q
11:  3  4 50    b    q
12:  3  4 60    b    q
13:  3  6 20    b    q

B[!.(NA_character_), on="valA"] is an anti-join that drops rows with NAs in valA. The code above attempts to generalize this (since the NA needs to match the type of the column).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM