简体   繁体   中英

tidyr join an ID table with main table across multiple columns

This seems like a very basic operation, but my searches are not finding a simple solution. As an example of what I am trying to do, consider the following two data frames from a database. First an ID table that assigns an index to a color name:

ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))

ColorID
# A tibble: 4 x 2
     ID Name 
  <int> <chr>
1     1 Red  
2     2 Green
3     3 Blue 
4     4 Black

Next some table that points to these color indexes (instead of storing text strings):

Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2), 
                  Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))
Widgets
# A tibble: 6 x 4
  Front  Back   Top Bottom
  <dbl> <dbl> <dbl>  <dbl>
1     1     4     4      1
2     3     4     3      2
3     4     3     2      3
4     2     3     1      4
5     1     1     2      3
6     1     2     3      2

Now I just want to join the two tables to substitute the index values with the actual color names, so what I want is:

Joined <- tibble(Front = c("Red", "Blue", "Black", "Green", "Red","Red"),
                 Back = c("Black", "Black", "Blue","Blue", "Red", "Green"),
                 Top = c("Black","Blue", "Green", "Red", "Green", "Blue"),
                 Bottom = c("Red", "Green", "Blue", "Black", "Blue","Green"))
Joined
# A tibble: 6 x 4
  Front Back  Top   Bottom
  <chr> <chr> <chr> <chr> 
1 Red   Black Black Red   
2 Blue  Black Blue  Green 
3 Black Blue  Green Blue  
4 Green Blue  Red   Black 
5 Red   Red   Green Blue  
6 Red   Green Blue  Green 

I've tried many iterations with no success, what I thought would work is something like:

J <- Widgets %>% inner_join(ColorID, by = c(. = "ID"))

I can tackle this column by column by using one variable at a time, eg

J <- Widgets %>% inner_join(ColorID, by = c("Front" = "ID"))

Which doesn't replace "Front", but instead creates a new "Name" column. Seems like there has to be a simple solution to this though. Thanks.

Does this work:

library(dplyr)
library(tidyr)

Widgets %>% pivot_longer(everything()) %>% 
  inner_join(ColorID, by = c('value' = 'ID')) %>% select(-value) %>% 
    pivot_wider(names_from = name, values_from = Name) %>% unnest(everything())
# A tibble: 6 x 4
  Front Back  Top   Bottom
  <chr> <chr> <chr> <chr> 
1 Red   Black Black Red   
2 Blue  Black Blue  Green 
3 Black Blue  Green Blue  
4 Green Blue  Red   Black 
5 Red   Red   Green Blue  
6 Red   Green Blue  Green 

There is no need for join functions:

library(dplyr)

ColorID <- tibble(ID = c(1:4), Name = c("Red", "Green", "Blue", "Black"))
# reorder so that row number and ID are different
ColorID <- ColorID[c(2, 1, 4, 3), ] 

Widgets <- tibble(Front = c(1,3,4,2,1,1), Back = c(4,4,3,3,1,2), 
                  Top = c(4,3,2,1,2,3), Bottom = c(1,2,3,4,3,2))

check_id <- function(col){
  ColorID$Name[match(col, ColorID$ID)]
}

Widgets %>% 
  mutate(across(everything(), check_id))

# A tibble: 6 x 4
  Front Back  Top   Bottom
  <chr> <chr> <chr> <chr> 
1 Red   Black Black Red   
2 Blue  Black Blue  Green 
3 Black Blue  Green Blue  
4 Green Blue  Red   Black 
5 Red   Red   Green Blue  
6 Red   Green Blue  Green 

(Edited) What I'm doing with dplyr and mutate is matching the numbers on Widgets with the number on the ColorID$ID column. This provides me with the row on the ColorID data frame I need for extracting the name.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM