Joining two incomplete data.tables with the same column names

Question

I have two incomplete data.tables with the same column names.

 dt1 <- data.table(id = c(1, 2, 3), v1 = c("w", "x", NA), v2 = c("a", NA, "c")) dt2 <- data.table(id = c(2, 3, 4), v1 = c(NA, "y", "z"), v2 = c("b", "c", NA))

They look like this:

 dt1 id v1 v2 1: 1 wa 2: 2 x <NA> 3: 3 <NA> c

 > dt2 id v1 v2 1: 2 <NA> b 2: 3 y c 3: 4 z <NA>

Is there a way to merge the two by filling in the missing info?

This is the result I'm after:

 id v1 v2 1: 1 wa 2: 2 xb 3: 3 y c 4: 4 z <NA>

I've tried various data.table joins, merges but I either get the columns repeated:

 > merge(dt1, + dt2, + by = "id", + all = TRUE) id v1.x v2.x v1.y v2.y 1: 1 wa <NA> <NA> 2: 2 x <NA> <NA> b 3: 3 <NA> c y c 4: 4 <NA> <NA> z <NA>

or the rows repeated:

 > merge(dt1, + dt2, + by = names(dt1), + all = TRUE) id v1 v2 1: 1 wa 2: 2 <NA> b 3: 2 x <NA> 4: 3 <NA> c 5: 3 y c 6: 4 z <NA>

Both data.tables have the same column names.

Answer 1

You can group by ID and get the unique values after omitting NAs, ie

library(data.table) merge(dt1, dt2, all = TRUE)[, lapply(.SD, function(i)na.omit(unique(i))), by = id][] # id v1 v2 #1: 1 wa #2: 2 xb #3: 3 y c #4: 4 z <NA>

Answer 2

You could also start out with rbind():

 rbind(dt1, dt2)[, lapply(.SD, \(x) unique(x[.is,na(x)])): by = id] # id v1 v2 # <num> <char> <char> # 1: 1 wa # 2: 2 xb # 3: 3 y c # 4 4 z <NA>

Answer 3

First full_join and after that group_by per id and merge the rows:

 library(dplyr) library(tidyr) dt1 %>% full_join(dt2, by = c("id", "v1", "v2")) %>% group_by(id) %>% fill(starts_with('v'),.direction = 'updown') %>% slice(1) %>% ungroup

Output:

 # A tibble: 4 × 3 id v1 v2 <dbl> <chr> <chr> 1 1 wa 2 2 xb 3 3 y c 4 4 z NA

Joining two incomplete data.tables with the same column names

Question

3 answers

solution1
3 ACCPTED 2022-07-04 09:52:52

solution2
3 2022-07-04 10:06:29

solution3
0 2022-07-04 10:12:18

Joining two incomplete data.tables with the same column names

Question

3 answers

solution1 3 ACCPTED 2022-07-04 09:52:52

solution2 3 2022-07-04 10:06:29

solution3 0 2022-07-04 10:12:18

solution1
3 ACCPTED 2022-07-04 09:52:52

solution2
3 2022-07-04 10:06:29

solution3
0 2022-07-04 10:12:18