使用agrep进行模糊但不太模糊的字符串匹配

Question

I have a string like this: 我有一个像这样的字符串：

text <- c("Car", "Ca-R", "My Car", "I drive cars", "Chars", "CanCan")

I would like to match a pattern so it is only matched once and with max. 我想匹配一个模式，因此它只能匹配一次，并且最多匹配一次。 one substitution/insertion. 一种替换/插入。 the result should look like this: 结果应如下所示：

> "Car"

I tried the following to match my pattern only once with max. 我尝试了以下操作，以使我的模式仅与max匹配一次。 substitution/insertion etc and get the following: 替换/插入等，并获得以下信息：

> agrep("ca?", text, ignore.case = T, max = list(substitutions = 1, insertions = 1, deletions = 1, all = 1), value = T)
[1] "Car"          "Ca-R"         "My Car"       "I drive cars" "CanCan"

Is there a way to exclude the strings which are n-characters longer than my pattern? 是否可以排除比我的模式长n个字符的字符串？

Answer 1

An alternative which replaces agrep with adist : 替代它取代agrep与adist ：

text[which(adist("ca?", text, ignore.case=TRUE) <= 1)]

adist gives the number of insertions/deletions/substitutions required to convert one string to another, so keeping only elements with an adist of equal to or less than one should give you what you want, I think. adist给出了将一个字符串转换为另一个字符串所需的插入/删除/替换的数量，因此，我认为仅保留adist等于或小于adist的元素应该可以为您提供所需的内容。

This answer is probably less appropriate if you really want to exclude things "n-characters longer" than the pattern (with n being variable), rather than just match whole words (where n is always 1 in your example). 如果您确实想排除比模式（n为变量）“ n个字符更长”的东西，而不是仅仅匹配整个单词（在示例中n始终为1），则此答案可能不太合适。

Answer 2

You can use nchar to limit the strings based on their length: 您可以使用nchar来根据字符串的长度限制字符串：

pattern <- "ca?"
matches <- agrep(pattern, text, ignore.case = T, max = list(substitutions = 1, insertions = 1, deletions = 1, all = 1), value = T)
n <- 4
matches[nchar(matches) < n+nchar(pattern)]
# [1] "Car"    "Ca-R"   "My Car" "CanCan"

使用agrep进行模糊但不太模糊的字符串匹配

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-10-07 14:22:59

解决方案2
0 2014-10-07 14:18:46

使用agrep进行模糊但不太模糊的字符串匹配

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-10-07 14:22:59

解决方案2 0 2014-10-07 14:18:46

解决方案1
1 已采纳 2014-10-07 14:22:59

解决方案2
0 2014-10-07 14:18:46