简体   繁体   中英

Replace values in a data table based on several columns in lookup table

Given a data table df with among others, a code column and a version column, I would like to replace the values in the code column based on a lookup table look .

df <- data.table(structure(list(year = c("1951", "1951", "1951", "1951", "1951"),
                     region = c(10, 11, 12, 18, 4),
                     code = c("140", "140", "140","1403", "1404"),
                     version = c(6, 7, 8, 9, 9)), row.names = c(NA,-5L), class = c("data.table", "data.frame")))

year region code version
1: 1951     10  140       6
2: 1951     11  140       7
3: 1951     12  140       8
4: 1951     18 1403       9
5: 1951      4 1404       9

look <- data.table(structure(list(code = c("C00", "C000", "C001", "C002", "C003","C004"),
                                  ver67 = c(140L, 1400L, 1401L, NA, NA, NA),
                                  ver8 = c(140L,1400L, 1401L, NA, NA, NA),
                                  ver9 = c(140L, 1400L, 1401L, NA, 1403L,1404L)), row.names = c(NA, -6L), class = c("data.table", "data.frame")))

code ver67 ver8 ver9
1:  C00   140  140  140
2: C000  1400 1400 1400
3: C001  1401 1401 1401
4: C002    NA   NA   NA
5: C003    NA   NA 1403
6: C004    NA   NA 1404

So that the code values of df is replaced with the code values of look when matching the code and corresponding version, as below.

year region code
1: 1951     10  C00
2: 1951     11  C00
3: 1951     12  C00
4: 1951     18 C003
5: 1951      4 C004

I am not quite sure how to tackle this challenge and would love some inputs to get me started.

It is probably easiest, to first transform your lookup-table into a long-format table, and then use left_join to join the lookup-values to the table (assuming you are fine with using tidyverse, and not sticking that much to data.table):

library(data.table)
library(tidyverse)

look_long <- look %>%
  pivot_longer(starts_with("ver"), names_to = "ver") %>%
  drop_na() %>%
  mutate(ver = str_split(str_remove(ver, "ver"), "")) %>%
  unnest(ver) %>%
  transmute(ver = as.integer(ver),
            value = as.character(value),
            newcode = code)

df %>%
  left_join(look_long, by = c("code" = "value", "version" = "ver"))
#>    year region code version newcode
#> 1: 1951     10  140       6     C00
#> 2: 1951     11  140       7     C00
#> 3: 1951     12  140       8     C00
#> 4: 1951     18 1403       9    C003
#> 5: 1951      4 1404       9    C004

Created on 2022-06-30 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM