简体   繁体   English

R 匹配忽略大小写和特殊字符

[英]R match ignore case and special characters

I have searched and found similar answers but not exactly what I need.我已经搜索并找到了类似的答案,但不完全是我需要的。

I want to identify matches in 2 strings, ignoring case and spaces and special characters.我想识别 2 个字符串中的匹配项,忽略大小写和空格以及特殊字符。

list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')

All below should give the same output (1 2 3)以下所有内容都应给出相同的输出 (1 2 3)

match(list1, list1)
match(list1, list2)
match(list1, list3)

I have tried str_detect(list1, regex(list2, ignore_case = TRUE)) but that doesn't give the same type of output (and I don't know how to incorporate the special characters/spaces in there.我试过str_detect(list1, regex(list2, ignore_case = TRUE))但这并没有给出相同类型的输出(我不知道如何在那里合并特殊字符/空格。

You can create a regex that pulls out only the letters in the middle of the strings using gsub , and then convert them to lowercase.您可以创建一个使用gsub仅提取字符串中间字母的正则表达式,然后将它们转换为小写。 You can then use standard match on the result.然后,您可以对结果使用标准match Best to put all this in its own function:最好将所有这些放在自己的函数中:

list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')

match2 <- function(a, b)
{
  a <- tolower(gsub("(.*)([[:alpha:]]+)(.*)", "\\2", a))
  b <- tolower(gsub("(.*)([[:alpha:]]+)(.*)", "\\2", b))
  match(a, b)
}

match2(list1, list1)
#> [1] 1 2 3
match2(list1, list2)
#> [1] 1 2 3
match2(list1, list3)
#> [1] 1 2 3

Created on 2020-02-21 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2020 年 2 月 21 日创建

See that @Allan Cameron posted a very similar solution right before me... going to leave this anyways because different enough.. ?!看到@Allan Cameron 在我面前发布了一个非常相似的解决方案......无论如何都要离开这个,因为足够不同......?!

list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')

regex to replace any symbol that is not an alphabetic character with an empty string: regex 用空字符串替换任何不是字母字符的符号:

f <- function(x) {
  return(tolower(gsub("[^[:alpha:]]", "", x)))
}

match(f(list1), f(list2))
match(f(list1), f(list3))
match(f(list2), f(list3))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM