简体   繁体   中英

How to optimize Sphinx search for fuzzy text matching?

Situation: I have a MySQL DB with 2mil records in total containing English and Chinese text words and corpus along with their relationships. It is on a dedicated server with 1.5G of RAM with 2.26Ghz dual core CPU. When searching with a string containing more than 30 Chinese characters, it takes around 4 seconds to get a result. This is too slow.

Search method: When running a query, once there are 4 or more words that match, consider the query a success and then sort the results by relevance and pick the one that's the highest match.

Here is a snippet of how it is done right now:

$this->sphinx->ResetFilters();
$this->sphinx->SetMatchMode(SPH_MATCH_ANY);
//Sort by relevance
$this->sphinx->SetSortMode(SPH_SORT_RELEVANCE);
$this->sphinx->SetArrayResult(true);
//Get 10 results
$this->sphinx->SetLimits(0,10);
//Filter the length
$this->sphinx->SetFilterRange('en_length', 10,50);

How can I improve the performance of the search? I want this under 1 second if possible. I've tried using SPH_MATCH_ALL and that works really fast. I believe the problem may be the matching mode that is being used for fuzzy match?

UPDATE: Using the quorum operator should be faster but using it returns unexpected values:

This is the result when using the OR operator (normal): 在此处输入图片说明 And this is how it looks with the Quorum operator (corrupted):

在此处输入图片说明

Filtering by non-FT attribute might be slow. If you are looking to get document with 4 or more matches you may want to use quorum operator:

"get me any document with more than four matches"/4

this requires SPH_MATCH_EXTENDED mode to be enabled

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM