简体   繁体   中英

'best match' human name search/rank using Sphinx search

I'm trying to add a predictive 'best match' name search functionality to a custom CRM I've built and I'm running into some issues. I decided to go with Sphinx thinking it would do what I wanted pretty much right out of the box, yet I'm having some issues. I understand the matching modes sphinx uses, but I'm not really sure how I'd go about getting something like this, for example:

If I query: Mike Shinoda

It should be able to pull matches like these, ranked by best match: Mike Shinoda | Shinoda, Mike | Mike Shinoji | Michael Shinoda | Shinoda, Michael | Mike James Shinoda | Mike and Ike Shinoda | Shinoda, Miles Mike Shinoda | Shinoda, Mike | Mike Shinoji | Michael Shinoda | Shinoda, Michael | Mike James Shinoda | Mike and Ike Shinoda | Shinoda, Miles

What's the best way to go about doing something like that? I'm not married to Sphinx, I just couldn't find anything that looked like it would do the job better

I did already attempt implementing the suggestion in this stack question Sphinx and "did you mean ... ?" suggestions idea. WIll it work? , but it didn't really work too well because matching mode SPH_MATCH_ANY matched way too many records, and SPH_MATCH_ALL would pull in records like 'andrus Cheryl' when the query was 'sheryl curry' (because all the letters in sheryl curry are in 'andrus Cheryl'

EDIT

I am only indexing one field: contact_name

Firstly sphinx wont know that Mike = Michael. You will have to explicitly tell it such 'equivalences' - there is wordforms feature especially for that :)

> because all the letters in sheryl curry are in 'andrus Cheryl'

Sphinx wont do that. Sphinx matches on whole words. It doesnt do 'rearanged letters' matches.

Unless you've specifically implemented that (maybe you have from the did you mean suggestions) - in which case its not really what you want.

Suggest going back to plain normal sphinx indexing (no trigrams) and then run a query like

"^Mike Shinoda$" | "Mike Shinoda" | "^Mike Shinoda" | "Mike Shinoda$" | (^Mike Shinoda) | (Mike Shinoda$) | (Mike Shinoda) | (Mike Shinoda)

using SPH_MATCH_EXTENDED and SPH_RANKING_WORDCOUNT

with wordforms to take care of the Michael > Mike equivalence.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM