简体   繁体   English

使用模糊匹配搜索单个 MySQL 文本列

[英]Searching a single MySQL text column with fuzzy matching

I have a MySQL InnoDB table with a 'name' column (VARCHAR(255)) which I want users to be able to search against, returning all the matching rows.我有一个 MySQL InnoDB 表,其中包含一个“名称”列 (VARCHAR(255)),我希望用户能够对其进行搜索,并返回所有匹配的行。 However, I can't just use a LIKE query because the search needs to allow for users typing in names which are similar to the available names (eg prefixing with 'The', or not knowing that the correct name includes an apostrophe).但是,我不能只使用 LIKE 查询,因为搜索需要允许用户输入与可用名称相似的名称(例如,以“The”为前缀,或者不知道正确的名称包含撇号)。

Two examples are:两个例子是:

Name in DB: 'Rose and Crown' DB 中的名称:“玫瑰与皇冠”

Example possible searches which should match: 'Rose & Crown', 'Rose and Crown', 'rose and crown', 'The Rose and Crown'应匹配的可能搜索示例:“Rose & Crown”、“Rose and Crown”、“rose and Crown”、“The Rose and Crown”

Name in DB: 'Diver's Inn' DB中的名称:“潜水员旅馆”

Example possible searches which should match: 'Divers' Inn', 'The Diver's Inn', 'Divers Inn'应匹配的可能搜索示例:'Divers' Inn'、'The Diver's Inn'、'Divers Inn'

I also want to be able to rank the results by a 'closest match' relevance, although I'm not sure how that would be done (edit distance perhaps?).我还希望能够通过“最接近的匹配”相关性对结果进行排名,尽管我不确定如何做到这一点(也许是编辑距离?)。

It's unlikely that the table will ever grow beyond a few thousand rows, so a method which doesn't scale to millions of rows is fine.该表不太可能超过几千行,因此不扩展到数百万行的方法是可以的。 Once entered, the name value for a given row will not change, so if an expensive indexing operation is required that's not a problem.一旦输入,给定行的名称值将不会改变,因此如果需要昂贵的索引操作,这不是问题。

Is there an existing tool which will perform this task?是否有执行此任务的现有工具? I've looked at Zend_Search_Lucence but that seems to focus on documents, whereas I'm only interesting in searching a single column.我看过 Zend_Search_Lucence,但它似乎专注于文档,而我只对搜索单个列感兴趣。

Edit: On SOUNDEX searching, this doesn't produce the results I want.编辑:在 SOUNDEX 搜索中,这不会产生我想要的结果。 For example:例如:

SELECT soundex( 'the rose & crown' ) AS soundex1, soundex( 'rose and crown' ) AS soundex2;
soundex1    soundex2
T6265   R253265

Solution: In the end I've used Zend_Search_Lucence and just pretended that every name is in fact a document, which seems to achieve the result I want.解决方案:最后我使用了 Zend_Search_Lucence ,只是假装每个名字实际上都是一个文档,这似乎达到了我想要的结果。 I guess it's full text search in a way, even though each string is at most 3-4 words.我猜它在某种程度上是全文搜索,即使每个字符串最多 3-4 个单词。

Full Text Search (FTS) is the terminology for the database functionality you desire.全文搜索 (FTS) 是您想要的数据库功能的术语。 There's:有:

Here is a SO question that comes very close to what you want.这是一个非常接近您想要的问题的问题。 While the answer is for PHP and MySQL, the general principle still applies:虽然答案适用于 PHP 和 MySQL,但一般原则仍然适用:

How do I do a fuzzy match of company names in MYSQL with PHP for auto-complete? 如何将 MYSQL 中的公司名称与 PHP 进行模糊匹配以自动完成?

Basically you would use SOUNDEX to get you what you want.基本上你会使用 SOUNDEX 来得到你想要的。 If you need more power, longer strings, etc. you might want to look into Double Metaphone, which is an improvement over Metaphone and SOUNDEX:如果您需要更大的功率、更长的琴弦等,您可能需要研究 Double Metaphone,这是对 Metaphone 和 SOUNDEX 的改进:

http://aspell.net/metaphone/ http://aspell.net/metaphone/

http://www.atomodo.com/code/double-metaphone http://www.atomodo.com/code/double-metaphone

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM