简体   繁体   English

通过 R 中的部分匹配合并两个数据帧

[英]Merge two dataframes by a partial match in R

I want to merge two data frames by name;我想按名称合并两个数据框; however, the names differ slightly between the two data frames.但是,两个数据框的名称略有不同。 Is there a way to merge these two data frames by a partial match?有没有办法通过部分匹配来合并这两个数据帧? I have tried answers to other posts but have not gotten the results I need.我尝试了其他帖子的答案,但没有得到我需要的结果。 Thanks谢谢

#Create data frames
df1 <- data.frame(
  "Attending" = c("Kokabi, Nima", "Tong, Frank Charles","Devireddy, Chandan",
                  "Greenbaum, Adam B","Amin, Dina"),
  "Outcome" = rep(1, times = 5),stringsAsFactors = F)

df2 <- data.frame(
  "Credentialed" = c("Kokabi, Nima, MD","Tong, Frank Charles, MD",
                     "Devireddy, Chandanreddy M, MD", "Greenbaum, Adam Brett, MD",
                     "Amin, Dina, DDS"),
  "Status" = rep("Active", times = 5),stringsAsFactors = F)
#Desired result
final <- data.frame(
  "Attending" = c("Kokabi, Nima", "Tong, Frank Charles","Devireddy, 
Chandan","Greenbaum, Adam B","Amin, Dina"),
  "Outcome" = rep(1, times = 5),
  "Credentialed" = c("Kokabi, Nima, MD","Tong, Frank Charles, 
MD","Devireddy, Chandanreddy M, MD", "Greenbaum, Adam Brett, MD","Amin, 
Dina, DDS"),
  "Status" = rep("Active", times = 5)
)

head(final)

Here is a possible solution using grep .这是使用grep的可能解决方案。

df1$Credentialed <- grep(paste(df1$Attending,collapse = '|'),df2$Credentialed,value=T)

left_join(df1,df2)

Joining, by = "Credentialed"
            Attending Outcome                  Credentialed Status
1        Kokabi, Nima       1              Kokabi, Nima, MD Active
2 Tong, Frank Charles       1       Tong, Frank Charles, MD Active
3  Devireddy, Chandan       1 Devireddy, Chandanreddy M, MD Active
4   Greenbaum, Adam B       1     Greenbaum, Adam Brett, MD Active
5          Amin, Dina       1               Amin, Dina, DDS Active

Note, I would suggest setting stringsAsFactors=F in your data.frame call.请注意,我建议在 data.frame 调用中设置stringsAsFactors=F And note how you pasted the names -- the return will be read by R, not as a space.并注意您是如何粘贴名称的——返回值将由 R 读取,而不是作为空格读取。

df1 <- data.frame(
  "Attending" = c("Kokabi, Nima", "Tong, Frank Charles","Devireddy, Chandan",
                  "Greenbaum, Adam B","Amin, Dina"),
  "Outcome" = rep(1, times = 5),stringsAsFactors = F)

df2 <- data.frame(
  "Credentialed" = c("Kokabi, Nima, MD","Tong, Frank Charles, MD",
                     "Devireddy, Chandanreddy M, MD", "Greenbaum, Adam Brett, MD",
                     "Amin, Dina, DDS"),
  "Status" = rep("Active", times = 5),stringsAsFactors = F)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM