简体   繁体   English

字符串匹配算法

[英]String matching algorithm

Say I have 3 strings. 说我有3个字符串。 And then 1 more string. 再多1个字符串。
Is there an algorithm that would allow me to find which one of the first 3 strings matches the 4th string the most? 有没有一种算法可以让我找到前3个字符串中哪一个最匹配第4个字符串?
None of the strings are going to be exact matches, I'm just trying to find the closest match. 没有一个字符串会完全匹配,我只是想找到最接近的匹配。
And if the algorithm already exists in STL, that would be nice. 如果算法已经存在于STL中,那就太好了。

Thanks in advance. 提前致谢。

You don't specify what exactly you mean by "matches the most", so I assume you don't have precise requirements. 您没有通过“匹配最多”来指定您的确切含义,因此我假设您没有精确的要求。 In that case, Levenshtein distance in a reasonable metric. 在那种情况下, Levenshtein距离在合理的度量范围内。 Simply compute the Levenshtein distance between each of the three strings and the fourth, and pick the one that gives the lowest distance. 只需计算三个弦和第四个弦之间的Levenshtein距离,然后选择给出最低距离的弦。

You can implement the Levenshtein Distance algorithm, it provides a very nice measure of how close a match between two strings you have. 您可以实现Levenshtein距离算法,它可以很好地衡量您拥有的两个字符串之间的匹配程度。 It measures how many keystrokes you need to make in order to turn one string into the other. 它衡量您需要进行多少次击键才能将一个字符串转换为另一个字符串。 You can find a C++ implementation here . 您可以在此处找到C ++实现。

Compute Levenshtein Distance between string #4 and the three strings that you have. 计算Levenshtein字符串#4与您拥有的三个字符串之间的距离。 Pick the string with the smallest distance. 选择距离最小的字符串。

There's nothing ready in the STL, but what you need is some kind of string metric. 在STL中没有任何准备,但你需要的是某种字符串度量。

You have approximate string matching problem. 您有近似字符串匹配问题。 Depending on what kind of matching you want to perform, you will use different algorithm. 根据您要执行的匹配类型,您将使用不同的算法。 There are many.. SOUNDEX , Jaro-Winkler , Levenstein Distance , metaphore... etc. Regarding STL, I don't know any functions that implement those algorithms, but you can take a look here for some soource using c++. 有很多.. SOUNDEXJaro-WinklerLevenstein距离 ,metaphore ......等。关于STL,我不知道任何实现这些算法的函数,但你可以看看这里使用c ++的一些soource。 Also, note that if you are getting your strings from a database, it is very likely that your database engine implements some of those algorithms (most likely SOUNDEX). 另请注意,如果从数据库中获取字符串,则很可能您的数据库引擎实现了某些算法(很可能是SOUNDEX)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM