简体   繁体   English

根据字符串(名称)合并 R 中的两个数据帧

[英]Merge two dataframes in R based on string (name)

I have two data frames which I need to merge based on candidate and constituency column.我有两个数据框,我需要根据候选人和选区列进行合并。 Now the problem here is that in both of the data frames there are discrepancies between the spelling of names in both the data frames.现在的问题是,在两个数据框中,两个数据框中名称的拼写存在差异。

For example in one data frame name is Dr. Ashutosh Singh in other it is Dr Ashutosh Singh.例如,在一个数据帧中,名称是 Dr. Ashutosh Singh,而在另一个数据帧中,它是 Dr Ashutosh Singh。 In one data frame name is Dr. Vikash Singh in another its Vikash Singh.一个数据框的名称是 Vikash Singh 博士,另一个是 Vikash Singh。

I'm attaching a screenshot of both the data frames.我附上了两个数据框的屏幕截图。 first data frame第一个数据框

Secoond data frame第二个数据框

I have to map first data frame columns CAND_NAME and AC_NAME to the second data frame columns candidate and constituency respectively and have to merge them in one.我必须将第一个数据框列 CAND_NAME 和 AC_NAME 分别映射到第二个数据框列候选和选区,并且必须将它们合并为一个。

I'm sharing the Excel file too and the R code.我也在共享 Excel 文件和 R 代码。 I have to merge the three sheets into one.我必须将三张纸合并为一张。
Link for the excel file excel文件的链接

R Code代码

setwd("/home/lenovo/Documents/r_prog/")
library(readxl)

candidate2017=read_excel("LA 2017.xlsx", sheet = 1)
electors2017=read_excel("LA 2017.xlsx", sheet = 2)

ManipurCandidates2017ADR=read_excel("LA 2017.xlsx", sheet = 3)

ManipurCandidate2017=candidate2017[grepl("Manipur", candidate2017$ST_NAME),]
ManipurElectors2017=electors2017[grepl("Manipur", electors2017$ST_NAME),]


ManipurElectors2017 = data.frame(lapply(ManipurElectors2017, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))

ManipurCandidates2017ADR = data.frame(lapply(ManipurCandidates2017ADR, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))

ManipurCandidate2017 = data.frame(lapply(ManipurCandidate2017, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))


View(ManipurCandidate2017)
View(ManipurElectors2017)
View(ManipurCandidates2017ADR)

mergedData = merge(ManipurCandidate2017,ManipurCandidates2017ADR , 
              by.x=c('CAND_NAME'), by.y=c('Candidate'), all = TRUE)

I am new to R please help.我是 R 新手,请帮忙。 Thanks In advance.提前致谢。

A possible solution involves using Approximate String Matching (Fuzzy Matching).一种可能的解决方案涉及使用近似字符串匹配(模糊匹配)。 Check out the agrep() function.查看agrep()函数。 You can of course embed agrep() into a merge() call.您当然可以将agrep()嵌入到merge()调用中。 I cannot write the code since you don't provide a reproducible example.我无法编写代码,因为您没有提供可重现的示例。

The call would look something like this:调用看起来像这样:

dat3 <- merge(x=dat1,y = dat2[agrep(dat1$ID1[1],dat2$ID2),],all=TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM