R - 使用另一個 dataframe 更改 dataframe 的某些列中的值

Question

我在 R 中有以下數據框，如何將 test_data$origin_country 和 test_data$destin_country 替換為 country_codes$ID 的數值？

test_data <- data.frame(
  origin_country = c('US', 'US', 'DE', 'CN'),
  destin_country = c('DE', 'DE', 'UK', 'IT'),
  year = c(2020, 2020, 2019, 2019),
  item = c('wheat', 'wheat', 'wheat', 'rice'),
  value = c(2000, 2000, 3000, 2500))

country_codes <- data.frame(
  countries = c('CN', 'DE', 'IT', 'UK', 'US'),
  ID = c(1, 2, 3, 4, 5))

我見過非常相似的問題，但沒有人解決這個問題。 我想要的結果是：

output <- data.frame(
  origin_country = c('5', '5', '2', '1'),
  destin_country = c('2', '2', '4', '3'),
  year = c(2020, 2020, 2019, 2019),
  item = c('wheat', 'wheat', 'wheat', 'rice'),
  value = c(2000, 2000, 3000, 2500))

非常感謝您的見解！

Answer 1

很簡單

library(dplyr)


test_data %>% 
  mutate(origin_country = country_codes$ID[match(origin_country, country_codes$countries)],
         destin_country = country_codes$ID[match(destin_country, country_codes$countries)])
#>   origin_country destin_country year  item value
#> 1              5              2 2020 wheat  2000
#> 2              5              2 2020 wheat  2000
#> 3              2              4 2019 wheat  3000
#> 4              1              3 2019  rice  2500

^{由代表 package (v2.0.1) 於 2022 年 8 月 16 日創建}

Answer 2

如果您將 country_codes 作為命名字符向量，可能會更簡單，您可以像這樣傳遞給 str_replace

library(tidyverse)

test_data <- data.frame(
  origin_country = c('US', 'US', 'DE', 'CN'),
  destin_country = c('DE', 'DE', 'UK', 'IT'),
  year = c(2020, 2020, 2019, 2019),
  item = c('wheat', 'wheat', 'wheat', 'rice'),
  value = c(2000, 2000, 3000, 2500))

country_codes <- data.frame(
  countries = c('CN', 'DE', 'IT', 'UK', 'US'),
  ID = c(1, 2, 3, 4, 5)) 

# convert to named character vector
country_codes <- country_codes %>%
  mutate_at('ID', as.character) %>% 
  deframe() 

test_data %>% 
  mutate_at(c('origin_country', 'destin_country'), ~ str_replace_all(.x, country_codes))
#>   origin_country destin_country year  item value
#> 1              5              2 2020 wheat  2000
#> 2              5              2 2020 wheat  2000
#> 3              2              4 2019 wheat  3000
#> 4              1              3 2019  rice  2500

^{由代表 package (v2.0.1) 於 2022 年 8 月 16 日創建}

Answer 3

由於ID為1:5 ，因此match的 output 可以直接用於給出數字。

test_data[1:2] <- lapply(test_data[1:2], match, country_codes[,1])
test_data
#  origin_country destin_country year  item value
#1              5              2 2020 wheat  2000
#2              5              2 2020 wheat  2000
#3              2              4 2019 wheat  3000
#4              1              3 2019  rice  2500

如果數字與給定示例中的數字不同，並且需要從您可以使用的列ID中獲取。

test_data[1:2] <- country_codes$ID[sapply(test_data[1:2], match, country_codes[,1])]

或者使用命名向量的非常簡單的方法。

s <- setNames(country_codes$ID, country_codes$countries)
test_data$origin_country <- s[test_data$origin_country]
test_data$destin_country <- s[test_data$destin_country]

R - 使用另一個 dataframe 更改 dataframe 的某些列中的值

問題描述

3 個解決方案

解決方案1
0 2022-08-16 05:59:05

解決方案2
0 2022-08-16 09:08:16

解決方案3
0 2022-08-16 13:51:45

R - 使用另一個 dataframe 更改 dataframe 的某些列中的值

問題描述

3 個解決方案

解決方案1 0 2022-08-16 05:59:05

解決方案2 0 2022-08-16 09:08:16

解決方案3 0 2022-08-16 13:51:45

解決方案1
0 2022-08-16 05:59:05

解決方案2
0 2022-08-16 09:08:16

解決方案3
0 2022-08-16 13:51:45