简体   繁体   English

R匹配字符向量

[英]R Match character vectors

var1 is a character vector var1是一个字符向量

var1 <- c("tax evasion", "all taxes", "payment")

and var2 is another character vector var2是另一个字符向量

var2 <- c("bill", "income tax", "sales taxes")

Want to compare var1 and var2 and extract the terms which has a partial word match, for example, the desired answer in this case will be the following character vector: 想要比较var1和var2并提取具有部分单词匹配的术语,例如,在这种情况下,所需的答案将是以下字符向量:

"tax evasion", "all taxes", "income tax", "sales taxes"

I tried 我试过了

sapply(var1, grep, var2, ignore.case=T,value=T)

but not getting the desired answer. 但没有得到想要的答案。 How can it be done? 如何做呢?

Thanks. 谢谢。

You can do (I use magrittr package for clarity of the code): 您可以做(我使用magrittr包来简化代码):

library(magrittr)

findIn = function(u, v)
{
    strsplit(u,' ') %>%
        unlist %>%
        sapply(grep, value=T, x=v) %>%
        unlist %>%
        unique
}

unique(c(findIn(var1, var2), findIn(var2, var1)))
#[1] "income tax"  "sales taxes" "tax evasion" "all taxes"

May be you need 可能你需要

lst1 <- strsplit(var1, ' ')
lst2 <- strsplit(var2, ' ')

indx1 <- sapply(lst1, function(x) any(grepl(paste(unlist(lst2), 
       collapse="|"), x)))
indx2 <- sapply(lst2, function(x) any(grepl(paste(unlist(lst1),
       collapse="|"), x)))
c(var1[indx1], var2[indx2])
#[1] "tax evasion" "all taxes"   "income tax"  "sales taxes"

If there are intersects between var1 and var2, wrap with with unique as @ColonelBeauvel did in his elegant solution. 如果var1和var2之间存在相交,请使用@ColonelBeauvel在他的优雅解决方案中进行的unique包装。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM