简体   繁体   中英

R - Change values in some columns of a dataframe using another dataframe

I have following dataframes in R, How can I substitute the test_data$origin_country and test_data$destin_country with the numerical values at country_codes$ID?

test_data <- data.frame(
  origin_country = c('US', 'US', 'DE', 'CN'),
  destin_country = c('DE', 'DE', 'UK', 'IT'),
  year = c(2020, 2020, 2019, 2019),
  item = c('wheat', 'wheat', 'wheat', 'rice'),
  value = c(2000, 2000, 3000, 2500))

country_codes <- data.frame(
  countries = c('CN', 'DE', 'IT', 'UK', 'US'),
  ID = c(1, 2, 3, 4, 5))

I have seen very similar questions, but none tackling this case. My desired result is:

output <- data.frame(
  origin_country = c('5', '5', '2', '1'),
  destin_country = c('2', '2', '4', '3'),
  year = c(2020, 2020, 2019, 2019),
  item = c('wheat', 'wheat', 'wheat', 'rice'),
  value = c(2000, 2000, 3000, 2500))

Truly appreciate your insights!

It is simple

library(dplyr)


test_data %>% 
  mutate(origin_country = country_codes$ID[match(origin_country, country_codes$countries)],
         destin_country = country_codes$ID[match(destin_country, country_codes$countries)])
#>   origin_country destin_country year  item value
#> 1              5              2 2020 wheat  2000
#> 2              5              2 2020 wheat  2000
#> 3              2              4 2019 wheat  3000
#> 4              1              3 2019  rice  2500

Created on 2022-08-16 by the reprex package (v2.0.1)

Maybe even simpler if you take country_codes as named character vector, that you may pass to str_replace like so

library(tidyverse)

test_data <- data.frame(
  origin_country = c('US', 'US', 'DE', 'CN'),
  destin_country = c('DE', 'DE', 'UK', 'IT'),
  year = c(2020, 2020, 2019, 2019),
  item = c('wheat', 'wheat', 'wheat', 'rice'),
  value = c(2000, 2000, 3000, 2500))

country_codes <- data.frame(
  countries = c('CN', 'DE', 'IT', 'UK', 'US'),
  ID = c(1, 2, 3, 4, 5)) 

# convert to named character vector
country_codes <- country_codes %>%
  mutate_at('ID', as.character) %>% 
  deframe() 

test_data %>% 
  mutate_at(c('origin_country', 'destin_country'), ~ str_replace_all(.x, country_codes))
#>   origin_country destin_country year  item value
#> 1              5              2 2020 wheat  2000
#> 2              5              2 2020 wheat  2000
#> 3              2              4 2019 wheat  3000
#> 4              1              3 2019  rice  2500

Created on 2022-08-16 by the reprex package (v2.0.1)

As the ID is 1:5 the output of match could directly be used to give the numbers.

test_data[1:2] <- lapply(test_data[1:2], match, country_codes[,1])
test_data
#  origin_country destin_country year  item value
#1              5              2 2020 wheat  2000
#2              5              2 2020 wheat  2000
#3              2              4 2019 wheat  3000
#4              1              3 2019  rice  2500

In case the numbers are not like in the given example and needed to be taken form column ID you can use.

test_data[1:2] <- country_codes$ID[sapply(test_data[1:2], match, country_codes[,1])]

Or a very simple way using a named vector.

s <- setNames(country_codes$ID, country_codes$countries)
test_data$origin_country <- s[test_data$origin_country]
test_data$destin_country <- s[test_data$destin_country]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM