[英]R agrep: how to match with more than 1 substitution
I'm trying to match a string to a vector of strings: 我正在尝试将字符串与字符串向量匹配:
a <- c('abcde', 'abcdf', 'abcdg')
agrep('abcdh', a, max.distance=list(substitutions=1))
# [1] 1 2 3
agrep('abchh', a, max.distance=list(substitutions=2))
# character(0)
I didn't expect the latter result as substituting two characters from the pattern makes the pattern identical to the vector elements. 我没想到后者的结果是从模式中替换两个字符使得模式与矢量元素相同。 This does, however, work with all
instead of substitutions
: 但是,这适用于all
而不是substitutions
:
agrep('abchh', a, max.distance=list(all=2))
# [1] 1 2 3
What do I need to change to match with more than 1 substitution allowed? 我需要更改什么以匹配允许的超过1个替换? Is substitution
just a broken option? substitution
只是一个破碎的选择吗? Thanks. 谢谢。
Note: this question is essentially the same as this one: https://stat.ethz.ch/pipermail/r-help/2011-June/281731.html , but that was never answered. 注意:这个问题基本上与这个问题相同: https : //stat.ethz.ch/pipermail/r-help/2011-June/281731.html ,但从未回答过。
I did not realize that the questions were that old, anyway: 无论如何,我没有意识到问题是那么古老了:
The function needs cost
to be appropiate. 该功能需要cost
适当。 As ping said, you must set the maximum number of match cost; 正如ping所说,你必须设置最大匹配成本数; in your example: 在你的例子中:
a <- c('abcde', 'abcdf', 'abcdg')
agrep('abcdh', a, max.distance = list(cost = 1))
[1] 1 2 3
agrep('abchh', a, max.distance = 2)
[1] 1 2 3
Now, if you set cost
the program can do insertions, deletions and substitutions. 现在,如果设置cost
,程序可以执行插入,删除和替换。 If you want only evaluate substitutions, then: 如果您只想评估替换,那么:
agrep('abhhh', a,
max.distance=list(cost=3, substitutions=3,
deletions=0, insertions=0))
[1] 1 2 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.