简体   繁体   English

如何优化Sphinx搜索以进行模糊文本匹配?

[英]How to optimize Sphinx search for fuzzy text matching?

Situation: I have a MySQL DB with 2mil records in total containing English and Chinese text words and corpus along with their relationships. 情况:我有一个MySQL数据库,总共有2百万条记录,其中包含中英文文本单词和语料库以及它们之间的关系。 It is on a dedicated server with 1.5G of RAM with 2.26Ghz dual core CPU. 它在具有1.5G RAM和2.26Ghz双核CPU的专用服务器上。 When searching with a string containing more than 30 Chinese characters, it takes around 4 seconds to get a result. 使用包含30个以上汉字的字符串进行搜索时,大约需要4秒钟才能得到结果。 This is too slow. 这太慢了。

Search method: When running a query, once there are 4 or more words that match, consider the query a success and then sort the results by relevance and pick the one that's the highest match. 搜索方法:运行查询时,一旦有4个或更多匹配的单词,则认为该查询成功,然后按相关性对结果进行排序,并选择匹配度最高的单词。

Here is a snippet of how it is done right now: 这是目前如何完成的代码片段:

$this->sphinx->ResetFilters();
$this->sphinx->SetMatchMode(SPH_MATCH_ANY);
//Sort by relevance
$this->sphinx->SetSortMode(SPH_SORT_RELEVANCE);
$this->sphinx->SetArrayResult(true);
//Get 10 results
$this->sphinx->SetLimits(0,10);
//Filter the length
$this->sphinx->SetFilterRange('en_length', 10,50);

How can I improve the performance of the search? 如何提高搜索性能? I want this under 1 second if possible. 如果可能,我希望在1秒以内。 I've tried using SPH_MATCH_ALL and that works really fast. 我试过使用SPH_MATCH_ALL,它的工作速度非常快。 I believe the problem may be the matching mode that is being used for fuzzy match? 我相信问题可能是用于模糊匹配的匹配模式?

UPDATE: Using the quorum operator should be faster but using it returns unexpected values: 更新:使用仲裁运算符应该更快,但是使用它会返回意外的值:

This is the result when using the OR operator (normal): 这是使用OR运算符(正常)时的结果: 在此处输入图片说明 And this is how it looks with the Quorum operator (corrupted): 这就是Quorum运算符的外观(已损坏):

在此处输入图片说明

Filtering by non-FT attribute might be slow. 按非FT属性过滤可能会很慢。 If you are looking to get document with 4 or more matches you may want to use quorum operator: 如果要获取具有4个或更多匹配项的文档,则可能需要使用仲裁运算符:

"get me any document with more than four matches"/4

this requires SPH_MATCH_EXTENDED mode to be enabled 这需要启用SPH_MATCH_EXTENDED模式

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM