简体   繁体   English

使用 2 个数据帧 (R) 匹配和替换值

[英]Match and replace value using 2 Data Frames (R)

2 dfs, need to match "Name" with info$Name and replace corresponding values in details$Salary , df - details should retain all values and there should be no NAs(if match found replace the value if not found leave as it is) 2 个 dfs,需要将 "Name" 与 info$Name 匹配并替换 details$Salary 中的相应值,df - details 应保留所有值并且不应有 NAs(如果找到匹配,则替换值,如果找不到则保持原样)

details<- data.frame(Name = c("Aks","Bob","Caty","David","Enya","Fredrick","Gaby","Hema","Isac","Jaby","Katy"),
                     Age = c(12,22,33,43,24,67,41,19,25,24,32),
                     Gender = c("f","m","m","f","m","f","m","f","m","m","m"),
                     Salary = c(1500,2000,3.6,8500,1.2,1400,2300,2.5,5.2,2000,1265))

info <- data.frame(Name = c("caty","Enya","Dadi","Enta","Billu","Viku","situ","Hema","Ignu","Isac"),
                income = c(2500,5600,3200,1522,2421,3121,4122,5211,1000,3500))   

Expected Result :预期结果 :

Name      Age Gender Salary
Aks       12      f   1500
Bob       22      m   2000
Caty      33      m   2500
David     43      f   8500
Enya      24      m   5600
Fredrick  67      f   1400
Gaby      41      m   2300
Hema      19      f   5211
Isac      25      m   3500
Jaby      24      m   2000
Katy      32      m   1265     

None of the following is giving expected result以下均未给出预期结果

dplyr::left_join(details,info,by = "Name") 
dplyr::right_join(details,info,by = "Name") 
dplyr::inner_join(details,info, by ="Name") # for other matching and replace this works fine but not here
dplyr:: full_join(details,info,by ="Name")

All the results are giving NA's , tried using match function also but it is not giving desired result, any help would be highly appreciated所有结果都给出了 NA's ,也尝试使用 match 函数但它没有给出想要的结果,任何帮助将不胜感激

You have Name in both the dataframe in different cases, we need to first bring them in the same case then do a left_join with them and use coalesce to select the first non-NA value between income and salary .在不同情况下,您在两个数据left_join都有Name ,我们需要首先将它们放在同一个案例中,然后对它们进行left_join并使用coalesce来选择incomesalary之间的第一个非 NA 值。

library(dplyr)

details %>% mutate(Name = stringr::str_to_title(Name)) %>%
  left_join(info %>% mutate(Name = stringr::str_to_title(Name)), by = "Name") %>%
  mutate(Salary = coalesce(income, Salary)) %>%
  select(names(details))

#       Name Age Gender Salary
#1       Aks  12      f   1500
#2       Bob  22      m   2000
#3      Caty  33      m   2500
#4     David  43      f   8500
#5      Enya  24      m   5600
#6  Fredrick  67      f   1400
#7      Gaby  41      m   2300
#8      Hema  19      f   5211
#9      Isac  25      m   3500
#10     Jaby  24      m   2000
#11     Katy  32      m   1265

A base R solution:一个基本的 R 解决方案:


matches <- match(tolower(details$Name), tolower(info$Name))
match <-  !is.na(matches)

details$Salary[match] <- info$income[matches[match]]

#Result
Name Age Gender Salary
1       Aks  12      f   1500
2       Bob  22      m   2000
3      Caty  33      m   2500
4     David  43      f   8500
5      Enya  24      m   5600
6  Fredrick  67      f   1400
7      Gaby  41      m   2300
8      Hema  19      f   5211
9      Isac  25      m   3500
10     Jaby  24      m   2000
11     Katy  32      m   1265

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM