简体   繁体   English

如何确定数组中哪个字符串与给定字符串最相似?

[英]How to determine which string in an array is most similar to a given string?

Given a string, 给定一个字符串,

string name = "Michael";

I want to be able to evaluate which string in array is most similar: 我希望能够评估数组中哪个字符串最相似:

string[] names = new[] { "John", "Adam", "Paul", "Mike", "John-Michael" };

I want to create a message for the user: "We couldn't find 'Michael', but 'John-Michael' is close. Is that what you meant?" 我想为用户创建一条消息:“我们找不到'Michael',但'John-Michael'很接近。这就是你的意思吗?” How would I make this determination? 我该如何做出这个决定?

This is usually done using the Edit distance / Levenshtein distance by comparing which word is the closest based on the number of deletions, additions or changes required to transform one word into the other. 这通常使用编辑距离/ Levenshtein距离来完成,通过比较哪个单词是最接近的,基于将一个单词转换为另一个单词所需的删除,添加或更改的数量。

There's an article providing you with a generic implementation for C# here . 还有为您提供一个通用的实现为C#的文章在这里

Here you have the results for your example using the Levenshtein Distance: 在这里,您可以使用Levenshtein距离获得示例的结果:

EditDistance["Michael",#]&/@{"John","Adam","Paul","Mike","John-Michael"}
{6,6,5,4,5}  

Here you have the results using the Smith-Waterman similarity test 在这里,您可以使用Smith-Waterman相似性检验得到结果

SmithWatermanSimilarity["Michael",#]&/@{"John","Adam","Paul","Mike","John-Michael"}
{0.,0.,0.,2.,7.} 

HTH! HTH!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM