简体   繁体   English

如何将字符串与另一个几乎相同的字符串匹配(模糊匹配)

[英]How to match string with another string that is almost the same (fuzzy matching)

In Rails, I am passing in a string: 'AE18BX21'.在 Rails 中,我传入一个字符串:'AE18BX21'。 I am querying the database to find strings that match with the input string.我正在查询数据库以查找与输入字符串匹配的字符串。 However the input string and the string in the database sometimes don't match up.然而,输入字符串和数据库中的字符串有时不匹配。 Sometimes there is an extra letter/number, sometimes a letter/number is missing, or sometimes the letter/number is a different letter/number.有时会有一个额外的字母/数字,有时会缺少一个字母/数字,或者有时字母/数字是不同的字母/数字。

I have tried a few different regex expressions like:我尝试了一些不同的正则表达式,例如:

Table.where("string =~ ?", 'A+E+1+8+B+X+2+1')

Table.where("string =~ ?", '(A|.)+(E|.)+(1|.)+(8|.)+(B|.)+(X|.)+(2|.)+(1|.)')

In an ideal world, I would want it to return only the strings that match up 80% or more.在理想的世界中,我希望它只返回匹配80%或更多的字符串。

After reading your question, I think you want something like Levenshtein distance, and as you stated in your comment, for Postgres you could use it.阅读您的问题后,我认为您想要 Levenshtein distance 之类的东西,正如您在评论中所述,对于 Postgres,您可以使用它。

Quoting its documentation here: https://www.postgresql.org/docs/9.1/static/fuzzystrmatch.html在此处引用其文档: https : //www.postgresql.org/docs/9.1/static/fuzzystrmatch.html

test=# SELECT levenshtein('GUMBO', 'GAMBOL');
 levenshtein
-------------
           2
(1 row)

test=# SELECT levenshtein('GUMBO', 'GAMBOL', 2,1,1);
 levenshtein
-------------
           3
(1 row)

test=# SELECT levenshtein_less_equal('extensive', 'exhaustive',2);
 levenshtein_less_equal
------------------------
                      3
(1 row)

test=# SELECT levenshtein_less_equal('extensive', 'exhaustive',4);
 levenshtein_less_equal
------------------------
                      4
(1 row)

Then you can build your sql query with your desire distance:然后你可以用你想要的距离构建你的 sql 查询:

SELECT * 
FROM YourTable
WHERE levenshtein(string , 'AE18BX21') <= 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM