繁体   English   中英

使用 2 个数据帧 (R) 匹配和替换值

[英]Match and replace value using 2 Data Frames (R)

2 个 dfs,需要将 "Name" 与 info$Name 匹配并替换 details$Salary 中的相应值,df - details 应保留所有值并且不应有 NAs(如果找到匹配,则替换值,如果找不到则保持原样)

details<- data.frame(Name = c("Aks","Bob","Caty","David","Enya","Fredrick","Gaby","Hema","Isac","Jaby","Katy"),
                     Age = c(12,22,33,43,24,67,41,19,25,24,32),
                     Gender = c("f","m","m","f","m","f","m","f","m","m","m"),
                     Salary = c(1500,2000,3.6,8500,1.2,1400,2300,2.5,5.2,2000,1265))

info <- data.frame(Name = c("caty","Enya","Dadi","Enta","Billu","Viku","situ","Hema","Ignu","Isac"),
                income = c(2500,5600,3200,1522,2421,3121,4122,5211,1000,3500))   

预期结果 :

Name      Age Gender Salary
Aks       12      f   1500
Bob       22      m   2000
Caty      33      m   2500
David     43      f   8500
Enya      24      m   5600
Fredrick  67      f   1400
Gaby      41      m   2300
Hema      19      f   5211
Isac      25      m   3500
Jaby      24      m   2000
Katy      32      m   1265     

以下均未给出预期结果

dplyr::left_join(details,info,by = "Name") 
dplyr::right_join(details,info,by = "Name") 
dplyr::inner_join(details,info, by ="Name") # for other matching and replace this works fine but not here
dplyr:: full_join(details,info,by ="Name")

所有结果都给出了 NA's ,也尝试使用 match 函数但它没有给出想要的结果,任何帮助将不胜感激

在不同情况下,您在两个数据left_join都有Name ,我们需要首先将它们放在同一个案例中,然后对它们进行left_join并使用coalesce来选择incomesalary之间的第一个非 NA 值。

library(dplyr)

details %>% mutate(Name = stringr::str_to_title(Name)) %>%
  left_join(info %>% mutate(Name = stringr::str_to_title(Name)), by = "Name") %>%
  mutate(Salary = coalesce(income, Salary)) %>%
  select(names(details))

#       Name Age Gender Salary
#1       Aks  12      f   1500
#2       Bob  22      m   2000
#3      Caty  33      m   2500
#4     David  43      f   8500
#5      Enya  24      m   5600
#6  Fredrick  67      f   1400
#7      Gaby  41      m   2300
#8      Hema  19      f   5211
#9      Isac  25      m   3500
#10     Jaby  24      m   2000
#11     Katy  32      m   1265

一个基本的 R 解决方案:


matches <- match(tolower(details$Name), tolower(info$Name))
match <-  !is.na(matches)

details$Salary[match] <- info$income[matches[match]]

#Result
Name Age Gender Salary
1       Aks  12      f   1500
2       Bob  22      m   2000
3      Caty  33      m   2500
4     David  43      f   8500
5      Enya  24      m   5600
6  Fredrick  67      f   1400
7      Gaby  41      m   2300
8      Hema  19      f   5211
9      Isac  25      m   3500
10     Jaby  24      m   2000
11     Katy  32      m   1265

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM