简体   繁体   English

R 中的 agrep max.distance 参数

[英]agrep max.distance arguments in R

I need some help with the specific arguments of the agrep package in R.我需要一些关于 R 中 agrep 包的特定参数的帮助。

In terms of cost, all, insertions, deletions and substitutions each have a "maximum number/fraction of substitutions" integer or fraction input parameter.就成本而言,所有插入、删除和替换均具有“最大数量/替换分数”整数或分数输入参数。

Ive read the documentation on it, but I still cannot figure out some specifics:我已经阅读了关于它的文档,但我仍然无法弄清楚一些细节:

  • What is the difference of a "cost=1" and "all=1"? “成本= 1”和“全部= 1”有什么区别?
  • How is a decimal interpreted, such as "cost=0.1", "inserts=0.9", "all=0.25", etc.?如何解释小数,例如“成本=0.1”、“插入数=0.9”、“全部=0.25”等?
  • I understand the basics of the Levenshtein Distance, but how is it applied in terms of the cost or all arguments?我了解 Levenshtein 距离的基础知识,但它如何应用于成本或所有参数?

Sorry if this is fairly basic, but like I said, the documentation I have read on it is slightly confusing.对不起,如果这是相当基本的,但就像我说的,我读过的文档有点令人困惑。

Thanks in advance提前致谢

Not 100% certain, but here is my understanding:不是 100% 肯定,但这是我的理解:

  • in max.distance , cost and all are interchangeable if you don't specify a costs argument (this is the next argument);max.distance ,如果您没有指定costs参数(这是下一个参数),则costall可以互换; if you do, then cost will limit based on the weighted (as per costs ) costs of insertion/deletion/substitutions you specified, whereas all will limit on the raw count of those operations如果您这样做,那么cost将根据您指定的插入/删除/替换的加权(按costs )成本进行限制,而all将限制这些操作的原始计数
  • The fraction represents what fraction of the number of characters in your pattern argument you want to allow as insertion/deletions/substitutions (ie 0.1 on a 10 character pattern would allow 1 change).分数表示您希望允许作为插入/删除/替换的pattern参数中字符数的分数(即 10 个字符模式上的 0.1 将允许 1 次更改)。 If you specify costs , then it is the fraction of # of characters in pattern * max( costs ), though presumably fractions in max.distance{insertions/deletions/substitutions} will be # of characters * corresponding costs value.如果您指定costs ,那么它是模式 * max( costs ) 中字符数的分数,尽管max.distance{insertions/deletions/substitutions}分数可能是字符数 * 相应的costs值。

I agree that the documentation is not as complete as it could be.我同意文档并不完整。 I discovered the above by building simple test examples and messing around with them.我通过构建简单的测试示例并弄乱它们发现了上述内容。 You should be able to do the same an confirm for yourself, particularly the last part (ie whether costs affects the fraction measure of max.distance{insertions/deletions/substitutions} ), which I haven't tested.您应该能够为自己做同样的确认,尤其是最后一部分(即costs是否影响max.distance{insertions/deletions/substitutions}的分数度量),我还没有测试过。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM