简体   繁体   中英

MySQL - How to get search results with accurate relevance

I have revisited this problem many times, and I have never really found a proper answer.

Is it possible to perform a MySQL search which returns ACTUAL accurately sorted results by relevancy?

I am trying to create an ajax search form which makes suggestions as the user types into an input field, and have found no decent solution to this using only pure MySQL queries. I know there are search servers available such as ElasticSearch, I want to know how to do it with a raw MySQL query only.


I have a table of school subjects. There are less than 1200 rows and this will never change. Let's perform a basic FULLTEXT search where the user starts typing "Bio".

Query ("Bio...") - FULLTEXT BOOLEAN MODE

SELECT name, MATCH(name) AGAINST('bio*' IN BOOLEAN MODE) AS relevance
FROM subjects
WHERE MATCH(name) AGAINST('bio*' IN BOOLEAN MODE)
ORDER BY relevance DESC
LIMIT 10

Results

name                                  |  relevance
--------------------------------------------------------
Biomechanics, Biomaterials and Prosthetics  |  1
Applied Biology                             |  1
Behavioural Biology                         |  1
Cell Biology                                |  1
Applied Cell Biology                        |  1
Developmental/Reproductive Biology          |  1
Developmental Biology                       |  1
Reproductive Biology                        |  1
Environmental Biology                       |  1
Marine/Freshwater Biology                   |  1

To show how bad these results are, here is a comparison with a simple LIKE query which shows all the more relevant results which weren't shown:

Query ("Bio...") - LIKE

SELECT id, name
WHERE name LIKE 'bio%'
ORDER BY name

Results

name                                  |  relevance
--------------------------------------------------------
Bio-organic Chemistry                       |  1
Biochemical Engineering                     |  1
Biodiversity                                |  1
Bioengineering                              |  1
Biogeography                                |  1
Biological Chemistry                        |  1
Biological Sciences                         |  1
Biology                                     |  1
Biomechanics, Biomaterials and Prosthetics  |  1
Biometry                                    |  1

And already you see how many subjects are not suggested, even though these are more likely what the user will be looking for.

The problem with using LIKE however, is how to search across multiple words and in the middle of words like FULLTEXT does.

The basic ordering I would want to implement is something like:

  1. First words starting with the search term
  2. Second words starting with the search term
  3. Words where the term is not at the start of the words
  4. Everything generally alphabetical if not further relevant

So my question is, how does one go about getting a sensibly sorted list of suggestions for the user with a MySQL search across multiple words?

You could use string functions, such as:

select id, name
from subjects
where name like concat('%', @search, '%')
order by 
  name like concat(@search, '%') desc,
  ifnull(nullif(instr(name, concat(' ', @search)), 0), 99999),
  ifnull(nullif(instr(name, @search), 0), 99999),
  name;

This gets you all entries containing @search. First those that have it at the beginning, then those that have it after a blank, then by the position of the occurrence, then alphabetical.

name like concat(@search, '%') desc uses MySQL's boolean logic by the way. 1 = true, 0 = false, so ordering this descending gives you true first.

SQL fiddle: http://sqlfiddle.com/#!9/c6321a/1

For others landing here (like I did): in my experience, for best results you can use a conditional depending on the number of search words. If there is only one word use LIKE '%word%' , otherwise use boolean full-text searches, like this:

if(sizeof($keywords) > 1){
   $query = "SELECT *,
             MATCH (col1) AGAINST ('+word1* +word2*' IN BOOLEAN MODE) 
             AS relevance1,
             MATCH (col2) AGAINST ('+word1* +word2*' IN BOOLEAN MODE) 
             AS relevance2
             FROM table1 c
             LEFT JOIN table2 p ON p.id = c.id
             WHERE MATCH(col1, col2) 
             AGAINST ('+word1* +word2*' IN BOOLEAN MODE) 
             HAVING (relevance1 + relevance2) > 0
             ORDER BY relevance1 DESC;";
    $execute_query = $this->conn->prepare($query);
}else{          
   $query = "SELECT * FROM table1_description c
             LEFT JOIN table2 p ON p.product_id = c.product_id
             WHERE colum1 LIKE ? AND column2 LIKE ?;";
        // sanitize
        $execute_query = $this->conn->prepare($query);
        $word=htmlspecialchars(strip_tags($keywords[0]));
        $word = "%{$word}%";
        $execute_query->bindParam(1, $word);
        $execute_query->bindParam(2, $word);
    }

This is the best results I can get using a combination of the answers above:

$searchTerm = 'John';
// $searchTerm = 'John Smit';
if (substr_count($searchTerm, ' ') <= 1)
    $sql = "SELECT id, name
    FROM people
    WHERE name like '%{$searchTerm}%')
    ORDER BY
      name LIKE '{$searchTerm}%') DESC,
      ifnull(nullif(instr(name, ' {$searchTerm}'), 0), 99999),
      ifnull(nullif(instr(name, '{$searchTerm}'), 0), 99999),
      name
    LIMIT 10";
}
else {
$searchTerm = '+' . str_replace(' ', ' +', $searchTerm) . '*';
$sql = "SELECT id,name, MATCH(lead.name) AGAINST('{$searchTerm}' IN BOOLEAN MODE) AS SCORE
        FROM lead
    WHERE MATCH(lead.name) AGAINST('{$searchTerm}' IN BOOLEAN MODE)
    ORDER BY `SCORE` DESC
    LIMIT 10";

Make sure you set a full text index on the column (or multiple columns if that's what you end up using) and reset the indexes using OPTIMIZE table_name .

The best thing about this is if you type Jo , then the person who has a name Jo will rank higher than John which is exactly what you want!

I tried this based on your described ordering.

SET @src := 'bio';
SELECT name,
name LIKE (CONCAT(@src,'%')),
         LEFT(SUBSTRING_INDEX(SUBSTRING_INDEX(name,' ',2),' ',-1),LENGTH(@src)) = @src,
         name LIKE (CONCAT('%',@src,'%'))
FROM subjects
ORDER BY name LIKE (CONCAT(@src,'%')) DESC,
         LEFT(SUBSTRING_INDEX(SUBSTRING_INDEX(name,' ',2),' ',-1),LENGTH(@src)) = @src DESC,
         name LIKE (CONCAT('%',@src,'%')) DESC,
         name

http://sqlfiddle.com/#!9/6bffa/1

I thought maybe you might even want to include the number of occurences of @src too Count the number of occurrences of a string in a VARCHAR field?

MATCH(s.name) AGAINST('"Applied Bio"' IN BOOLEAN MODE)

Above statement will search the exact search term, mean this two words must exist in each records.

ORDER BY s.name like concat("Applied Bio", '%') desc,
ifnull(nullif(instr(s.name, concat(' ', "Applied Bio")), 0), 99999),
ifnull(nullif(instr(s.name, "Applied Bio"), 0), 99999),
s.name

Order by first words starting with the search term.

Full SQL statement:

SELECT SQL_NO_CACHE 
s.id, s.name
FROM subjects s use index(name_fulltext) 
WHERE 
MATCH(s.name) AGAINST('"Applied Bio"' IN BOOLEAN MODE) 
GROUP BY s.id 
ORDER BY 
s.name like concat("Applied Bio", '%') desc,
ifnull(nullif(instr(s.name, concat(' ', "Applied Bio")), 0), 99999),
ifnull(nullif(instr(s.name, "Applied Bio"), 0), 99999),
s.name
LIMIT 100;

In order to get what you want you might take a look at combining several 'case when…' statements to with mysql's regexp which would give you an exact score per row based on your requirements. Regexp might be the piece of the puzzle you're missing: See https://dev.mysql.com/doc/refman/5.6/en/regexp.html (Answering on my phone so its hard to format the answer or give examples)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM