简体   繁体   中英

Best way to search for names in mysql

I want to search for a user named "Martins Silva". I'm using fulltext in BOOLEAN MODE.

 MATCH(name,lastname) AGAINST('+martins +silva' IN BOOLEAN MODE)

The search results for "Martins Silva" returns

Orleans Silva De Martins (1)
Armistrong Oliveira Martins Da Silva (2)
Douglas Martins Vieira Da Silva (3)
Glauciene Silva Martins (4)
Jose Martins Silva (5)
...

The problem is that there is a user named "Martins Silva" on the database but it only appears on this result set on the position 540.

This is the result I should expect from reading the documentation and seeing how the rank is calculated. However, it does not help me to solve this problem. I also tried searching with LIKE but I get the same.

Given that result set, the best for me would be:

Martins silva (540) -> because it is the exact phrase 
Jose Martins Silva (5) -> because it is the exact phrase that appears in a position first than in (2)
Armistrong Oliveira Martins Da Silva (2) -> distance between martins and silva is shorter than in (3)
Douglas Martins Vieira Da Silva (3)
Glauciene Silva Martins (4) -> lower priority when it is out of order
Orleans Silva De Martins (1)   

So, I think I could solve this problem with an algorithm that considers the order or the position that the words are in the query.

I tried calculating the levenshtein distance, but it is really slow for a large database.

Is there a way in MySQL to solve this? Or I would have to use something as Apache Lucene? Or What am I doing wrong? This search is the main thing on my website and it has to work really well.

Thank you so much, guys!

in your particular case, you will need to implement a levenshtein function in order to accomplish this. match will simply not be able to do it the right way. By sorting levenshtein relevancy ASC you will have from the most relevant to the least.

levenshtein function to add to your database:

DELIMITER $$
CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
-- max strlen=255
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
RETURN 0;
ELSEIF s1_len = 0 THEN
RETURN s2_len;
ELSEIF s2_len = 0 THEN
RETURN s1_len;
ELSE
WHILE j <= s2_len DO
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
END WHILE;
WHILE i <= s1_len DO
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
WHILE j <= s2_len DO
SET c = c + 1;
IF s1_char = SUBSTRING(s2, j, 1) THEN
SET cost = 0; ELSE SET cost = 1;
END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
IF c > c_temp THEN SET c = c_temp; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
IF c > c_temp THEN
SET c = c_temp;
END IF;
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
END WHILE;
SET cv1 = cv0, i = i + 1;
END WHILE;
END IF;
RETURN c;
END$$
DELIMITER ;

This modal query will sort by most relevant lastname first and name second and show the most relevant as first row, add LIMIT 1 to only get the most relevant result:

Select lastname, levenshtein(lastname,$var1) as relevance1,
name, levenshtein(name,$var2) as relevance2
FROM database
ORDER BY relevance 1 ASC, relevance 2 ASC

If you want exact search then you can merge the columns and search with them. Use something like this

select CONCAT(firstname," ",lastname) from tableName where CONCAT(firstname," ",lastname) = "Martins Silva";

Best solution for text search is Lucene.

If you use any other text searching algo it will be slow, and lucene has tested results both performance wise and ease of coding. http://lucene.apache.org/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM