简体   繁体   中英

Joining two incomplete data.tables with the same column names

I have two incomplete data.tables with the same column names.

 dt1 <- data.table(id = c(1, 2, 3), v1 = c("w", "x", NA), v2 = c("a", NA, "c")) dt2 <- data.table(id = c(2, 3, 4), v1 = c(NA, "y", "z"), v2 = c("b", "c", NA))

They look like this:

 dt1 id v1 v2 1: 1 wa 2: 2 x <NA> 3: 3 <NA> c
 > dt2 id v1 v2 1: 2 <NA> b 2: 3 y c 3: 4 z <NA>

Is there a way to merge the two by filling in the missing info?

This is the result I'm after:

 id v1 v2 1: 1 wa 2: 2 xb 3: 3 y c 4: 4 z <NA>

I've tried various data.table joins, merges but I either get the columns repeated:

 > merge(dt1, + dt2, + by = "id", + all = TRUE) id v1.x v2.x v1.y v2.y 1: 1 wa <NA> <NA> 2: 2 x <NA> <NA> b 3: 3 <NA> c y c 4: 4 <NA> <NA> z <NA>

or the rows repeated:

 > merge(dt1, + dt2, + by = names(dt1), + all = TRUE) id v1 v2 1: 1 wa 2: 2 <NA> b 3: 2 x <NA> 4: 3 <NA> c 5: 3 y c 6: 4 z <NA>

Both data.tables have the same column names.

You can group by ID and get the unique values after omitting NAs, ie

library(data.table) merge(dt1, dt2, all = TRUE)[, lapply(.SD, function(i)na.omit(unique(i))), by = id][] # id v1 v2 #1: 1 wa #2: 2 xb #3: 3 y c #4: 4 z <NA>

You could also start out with rbind():

 rbind(dt1, dt2)[, lapply(.SD, \(x) unique(x[.is,na(x)])): by = id] # id v1 v2 # <num> <char> <char> # 1: 1 wa # 2: 2 xb # 3: 3 y c # 4 4 z <NA>

First full_join and after that group_by per id and merge the rows:

 library(dplyr) library(tidyr) dt1 %>% full_join(dt2, by = c("id", "v1", "v2")) %>% group_by(id) %>% fill(starts_with('v'),.direction = 'updown') %>% slice(1) %>% ungroup

Output:

 # A tibble: 4 × 3 id v1 v2 <dbl> <chr> <chr> 1 1 wa 2 2 xb 3 3 y c 4 4 z NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM