简体   繁体   English

MySql匹配最特定的人的名字

[英]MySql matching most specific person's name

I am searching a MySql table for the likely match for a specific person. 我正在搜索MySql表以查找特定人的匹配项。 I am using lots of other criteria but assume we are just talking about the given names field. 我使用许多其他条件,但假设我们只是在谈论给定名称字段。

Say the search value is John William and the table contains 4 rows with name values 假设搜索值为John William,并且该表包含4个带有名称值的行

John
John William 
John William Henry
John Paul 

The first 3 could be the right person. 前三个可能是合适的人。 I want the search to return the one with the closest match to the name I have, which in this case is row 2. 我希望搜索返回与我的名字最接近的名字,在这种情况下是第2行。

The original code first does an exact match query using the full supplied name(s). 原始代码首先使用提供的完整名称进行完全匹配查询。 If that returns no records it then does a LIKE query using all names with % on the end. 如果没有返回记录,则使用所有以%结尾的名称进行LIKE查询。 If that fails it then does an exact search on the first name only. 如果失败,则仅对名字进行精确搜索。 In the above example the first query returns record 2 which is what is wanted so the other queries are not run. 在上面的示例中,第一个查询返回所需的记录2,因此不会运行其他查询。

The problem is if the search name is John William Henry George, you potentially have to do about 5 queries to gradually make the search less specific. 问题是,如果搜索名称是John William Henry George,则可能需要执行大约5个查询才能逐渐使搜索不那么具体。 The other problem is that the search value could be John W and the file could contain John William which I would want to match. 另一个问题是搜索值可能是John W,文件可能包含我想匹配的John William。 Or vice versa. 或相反亦然。

Is there some way of doing a single query that will return the closest match please? 是否有某种方法可以执行单个查询,该查询将返回最接近的匹配项? In other words that will return just row 2 in the above example. 换句话说,在上面的示例中将仅返回第2行。

In order of best fit this is what I think a search for John William Henry should match: 为了最合适,我认为对John William Henry的搜索应该匹配:

John William Henry
John William Henry %
John William H
John William H %
John W H
John W H %
John W
John

Note that there is no % after the John W because that will match John WB who cannot be the right person. 请注意,John W之后没有%,因为它将匹配不能成为合适人选的John WB。

Ok, new idea to try to make it more efficient. 好吧,尝试提高效率的新想法。 Is it possible to query the results of a previous query with some PHP code in between. 是否可以使用介于两者之间的一些PHP代码来查询先前查询的结果。 I expect not. 我希望不会。 Pseudo code as follows: 伪代码如下:

$coarse = Mysql search for John%
$count = mysql_num_rows($coarse);
if ($count == 1) { 
   $rec = mysql_fetch_row($coarse);
   return $rec[0];
} 
// Assume that produced 50 records. Now want to search within those only instead of millions

$fine = Mysql search within $coarse results for John William% 
$count = mysql_num_rows($fine);
if ($count == 1) { 
   $rec = mysql_fetch_row($fine);
   return $rec[0];
} 

I could obviously do the fine search in PHP but is it possible to do it in SQL, or can I maybe do the above with a stored procedure? 我显然可以在PHP中进行精细搜索,但是可以在SQL中进行精细搜索,还是可以对存储过程进行上述搜索?

I'd create a stored procedure. 我将创建一个存储过程。 Use an IF ELSE, check if EXISTS then use NOT IN to exclude the entries of Query 1. 使用IF ELSE,检查是否存在,然后使用NOT IN排除查询1的条目。

Note: There is an EXCEPT operator in ANSI SQL but MySQL doesn't support it so you'd use NOT IN to exclude rows of query 1. 注意:ANSI SQL中有一个EXCEPT运算符,但MySQL不支持它,因此您将使用NOT IN排除查询1的行。

Also don't put too much logic in the database/SQL... if you want to get fancy with your name matching I'd select a basic list of matches with SQL then use PHP code (with regular expressions) for the 'fuzzy/best' match or logic. 另外,不要在数据库/ SQL中添加过多的逻辑...如果您想通过名称匹配来实现幻想,我会选择一个与SQL匹配的基本列表,然后将PHP代码(带有正则表达式)用于“ /最佳匹配或逻辑。

CREATE PROCEDURE (@u varchar(max))
    BEGIN
        SET @v_counter = CHAR_LENGTH(@u);
        while @v_counter >0 do
            if (Select count(*) from table where firstName like @u+"%")>0
                @v_counter=0
            else
                Begin
                   @u=substr(@u, 0, -1)
                   @v_counter=char_length(u)
                END            
         end while
SELECT * from table where firstName like @u+"%"
Return
END;

This is untested, but should get you where you want to go. 这未经测试,但是应该可以让您到达想要的地方。 It is a procedure that should start with the full name and remove one character at a time to check for the first matching results. 此过程应以全名开头,并一次删除一个字符以检查第一个匹配结果。

Maybe try a different approach. 也许尝试其他方法。 Have a look into MySQLs fulltext search capabilities . 看一下MySQL的全文搜索功能

Let me give you an easy example to begin with. 让我举一个简单的例子开始。 Lets say your table looks like this: 假设您的表格如下所示:

 CREATE TABLE people (
     id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
     fullname TEXT,
     FULLTEXT (fullname)
 ) ENGINE=InnoDB;

Then you can query it like this: 然后您可以像这样查询它:

SELECT *, 
    MATCH (fullname) AGAINST ('+John William' IN BOOLEAN MODE) AS 'Probability'
FROM people
WHERE MATCH (fullname)
AGAINST ('+John William' IN BOOLEAN MODE)
ORDER BY Probability DESC;

The operators might have to be slightly altered in order to get your desired result. 为了获得您想要的结果,可能必须对操作员进行一些稍微的更改。

Keep in mind, for using fulltext indexes on InnoDB you need a newer version of MySQL. 请记住,要在InnoDB上使用全文索引,您需要更新的MySQL版本。 If this is not available, you have to use MyISAM as engine. 如果这不可用,则必须使用MyISAM作为引擎。

我认为最简单的方法是简单地进行搜索,从最具体的开始到最不具体的搜索。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM