简体   繁体   English

如何通过mysql / php中的最佳匹配对搜索结果进行排名?

[英]How do I rank search results by best match in mysql/php?

This is for MySQL/PHP scenario: 这适用于MySQL / PHP场景:

Let's say I need to find professionals based on their qualification. 假设我需要根据他们的资格找到专业人士。 Now assume the search input is "CA,BA". 现在假设搜索输入是“CA,BA”。

  1. I want it to match, 'CA','MCA','BCA','MBA',... which can be easily done by using LIKE or even REGEXP in MySQL if I disregard performance, now 'CA' is the exact match so I want a user with CA in his profile to be ranked higher than others. 我希望它匹配,'CA','MCA','BCA','MBA',...如果我忽视性能,可以通过在MySQL中使用LIKE甚至REGEXP轻松完成,现在'CA'是准确的匹配所以我希望在他的个人资料中使用CA的用户排名高于其他用户。
  2. Since I am searching for two entries, I want the resulting list to be further sorted based on whether the person matches(or partially matches) both qualifications instead of a single one. 由于我正在搜索两个条目,因此我希望根据该人是否匹配(或部分匹配)两个资格而不是单个资格来对结果列表进行进一步排序。

For the first one I guess I can use levenshtein distance but I am worried about performance. 对于第一个,我想我可以使用levenshtein距离,但我担心性能。 But for the second one I have no idea at all. 但对于第二个我根本不知道。 So my question is how to do this in the most performance efficient way? 所以我的问题是如何以最高效的方式做到这一点?

All ideas are welcome 欢迎所有想法

I would search for the Exact matches, throw them in an array, then search for the Like matches and throw them in an array. 我会搜索Exact匹配,将它们放入一个数组中,然后搜索Like匹配并将它们放入一个数组中。

Finally I would do an array_diff and result is there. 最后我会做一个array_diff,结果就在那里。

Levenshtein would likely be slow, but possible Levenshtein可能会很慢,但可能

Do one query for each value to check, getting the MIN lenvenshtein distance. 对要检查的每个值执行一次查询,获得MIN lenvenshtein距离。 Do a UNION ALL of the 2 queries, and use that as a sub query to select the person and the SUM of the min distances, and order by that value descending. 执行UNION ALL的2个查询,并将其用作子查询以选择最小距离的人和SUM,并按该值降序排序。

EDIT 编辑

Assuming you can redesign the tables 假设您可以重新设计表格

Have 3 tables:- 有3张桌子: -

Table of professionals Id Name ... 专业人士表名称...

Table of qualifications Id QualificationName 资格表 Id QualificationName

LinkTable ProfessionalId QualificationId LinkTable ProfessionalId QualificationId

Then do aa query that does a subselect for the levenshtein distance for the qualifications (which should mean only doing it per qualification, not per persons qualification):- 然后做一个查询,为资格的levenshtein距离做一个子选择(这应该意味着只按照资格进行,而不是根据人员资格): -

SELECT Name, SUM(Relevancy) AS SumRelevancy
FROM
(
    SELECT a.Name, MIN(c.Relevancy) AS Relevancy 
    FROM Professionals a
    INNER JOIN LinkTable b ON a.Id = b.ProfessionalId
    INNER JOIN
    (
        SELECT QualificationId, LEVENSHTEIN('CA', QualificationName) AS Relevancy FROM Qualifications
    ) c ON b.QualificationId = c.QualificationId
    GROUP BY a.Name
    UNION ALL
    SELECT a.Name, MIN(c.Relevancy) AS Relevancy 
    FROM Professionals a
    INNER JOIN LinkTable b ON a.Id = b.ProfessionalId
    INNER JOIN
    (
        SELECT QualificationId, LEVENSHTEIN('BA', QualificationName) AS Relevancy FROM Qualifications
    ) c ON b.QualificationId = c.QualificationId
    GROUP BY a.Name
) Sub1
GROUP BY Name
ORDER BY SumRelevancy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM