简体   繁体   English

如何优化搜索SQL查询?

[英]How to optimize the search SQL query?

I have written a search query which searches for similar names. 我写了一个搜索查询来搜索相似的名字。 It works with Power Set of tags and it sorts by similarity . 它与标签的Power Set配合使用 ,并按相似性排序。 for example if the search text is: shakespeare tragedy hamlet 例如,如果搜索文字是: shakespeare tragedy hamlet

the SQL generated is: 生成的SQL是:

SELECT DISTINCT id FROM (
    (SELECT * FROM books 
      WHERE name LIKE '%shakespeare%' 
      AND name LIKE '%tragedy%' 
      AND name LIKE '%hamlet%' limit 10)
    UNION
    (SELECT * FROM books 
      WHERE name LIKE '%shakespeare%' 
      AND name LIKE '%tragedy%' limit 10)
    UNION
    (SELECT * FROM books 
      WHERE name LIKE '%shakespeare%'  
      AND name LIKE '%hamlet%' limit 10)
    UNION
    (SELECT * FROM books 
      WHERE name LIKE '%tragedy%' 
      AND name LIKE '%hamlet%' limit 10)
    UNION
    (SELECT * FROM books WHERE name LIKE '%shakespeare%' limit 10)
    UNION
    (SELECT * FROM books WHERE name LIKE '%tragedy%' limit 10)
    UNION
    (SELECT * FROM books WHERE name LIKE '%hamlet%' limit 10)
) limit 10

there are two problems: 有两个问题:

  1. The Power Set creates 2^tags - 1 unions in my query, which means if some one wants to be precise and uses 6 tags, it will be 63 unions and it makes my query much slower. Power Set在我的查询中创建2^tags - 1联合,这意味着如果有人想要精确并使用6个标签,它将是63个联合,这会使我的查询慢得多。

  2. if the first union returns 10 rows, others are useless. 如果第一个联合返回10行,则其他无用。

Is there a way to optimize this query? 有没有一种方法可以优化此查询?

We can get all boosk where name is similar to past tag and add a custom ORDER BY based on similarity. 我们可以获取名称与过去标记相似的所有boosk,并基于相似性添加自定义ORDER BY。 If name contains tag +1 if not 0. SO if name contains all the 3 tags sum is 3 if just one sum is 1. 如果名称包含标签+1(如果不为0),则为0。如果名称包含所有3个标签,则SO为3(如果只有一个和为1)。

SELECT DISTINCT id 
FROM books 
where name LIKE '%shakespeare%'
   OR name LIKE '%tragedy%'
   OR name LIKE '%hamlet%'
ORDER BY IF(INSTR(name, 'shakespeare')>0,1,0)+
         IF(INSTR(name, 'tragedy')>0,1,0)+
         IF(INSTR(name, 'hamlet')>0,1,0) DESC
LIMIT 10

UPDATE: ORDER BY could be based on sum or just commas 更新:ORDER BY可以基于总和或只是逗号

If you switch to FULLTEXT index and use 如果切换到FULLTEXT索引并使用

MATCH(name) AGAINST('shakespeare tragedy hamlet')

you can get a somewhat reasonable ordering, and run a lot faster. 你可以得到一个有点道理排序,并且运行速度快了很多

If you want to insist on shakespeare being in the string, but the others are optional, this works better: '+shakespeare tragedy hamlet' . 如果要坚持将shakespeare放在字符串中,但其他字符串是可选的,则效果更好: '+shakespeare tragedy hamlet'

Caveat: There are both benefits and limitations of FULLTEXT . 注意: FULLTEXT有好处也有局限性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM