简体   繁体   English

慢MySQL全文搜索

[英]Slow MySQL full text search

I'm using this query to perform a full text search on a MySQL database: 我正在使用此查询在MySQL数据库上执行全文搜索:

SELECT DISTINCT 
questions.id, 
questions.uniquecode, 
questions.spam,
questions.questiondate,
questions.userid,
questions.description,
users.login AS username,
questions.questiontext,
questions.totalvotes,
MATCH(questions.questiontext, questions.uniquecode) 
AGAINST ('rock guitarist chick*' IN BOOLEAN MODE) AS relevance 

FROM questions 

LEFT JOIN users ON questions.userid = users.id 
LEFT JOIN answer_mapping ON questions.id = answer_mapping.questionid 
LEFT JOIN answers ON answer_mapping.answerid = answers.id
LEFT JOIN tagmapping ON questions.id = tagmapping.questionid
LEFT JOIN tags ON tagmapping.tagid = tags.id 

WHERE questions.spam < 10 

AND 

(
  MATCH(questions.questiontext, questions.uniquecode) 
  AGAINST ('rock guitarist chick*' IN BOOLEAN MODE) 

OR MATCH(answers.answertext) AGAINST ('rock guitarist chick*' IN BOOLEAN MODE) 

OR MATCH (tags.tag) AGAINST ('rock guitarist chick*' IN BOOLEAN MODE)

) GROUP BY questions.id ORDER BY relevance DESC

The results are very relevant, but the search is really slow and is getting slower and slower as the tables grow. 结果非常相关,但搜索速度非常慢,并且随着表的增长而越来越慢。

Table stats: 表统计:

questions - 400 records 问题 - 400条记录

indexes 索引

  • PRIMARY BTREE - id 主要的BTREE - 身份证
  • BTREE - uniquecode BTREE - 唯一代码
  • BTREE - questiondate BTREE - 质疑
  • BTREE - userid BTREE - 用户ID
  • FULLTEXT - questiontext FULLTEXT - 问题文本
  • FULLTEXT - uniquecode FULLTEXT - 唯一代码

answers - 3,635 records 答案 - 3,635条记录

indexes 索引

  • PRIMARY - BTREE - id 主要 - BTREE - 身份证
  • BTREE - answerdate BTREE - 回答日期
  • BTREE - questionid BTREE - 问题
  • FULLTEXT - answertext FULLTEXT - answertext

answer_mapping - 4,228 records answer_mapping - 4,228条记录

indexes 索引

  • PRIMARY - BTREE - id 主要 - BTREE - 身份证
  • BTREE - answerid BTREE - 答案
  • BTREE - questionid BTREE - 问题
  • BTREE - userid BTREE - 用户ID

tags - 1,847 records 标签 - 1,847条记录

indexes 索引

  • PRIMARY - BTREE - id 主要 - BTREE - 身份证
  • BTREE - tag BTREE - 标签
  • FULLTEXT - tag FULLTEXT - 标签

tagmapping - 3,389 records tagmapping - 3,389条记录

indexes 索引

  • PRIMARY - BTREE - id 主要 - BTREE - 身份证
  • BTREE - tagid BTREE - tagid
  • BTREE - questionid BTREE - 问题

For whatever reason when I remove the tagmapping and tags JOINS the search speeds up considerably. 无论出于何种原因,当我删除标记 映射标记 JOINS时,搜索速度会大大提高。

Do you have any tips on how to speed this query up? 您对如何加快查询速度有任何提示吗?

Thanks in advance! 提前致谢!

well you could combine your join into a cached view or extra table or something. 好吧,你可以将你的连接组合成缓存视图或额外的表或其他东西。 have your query cache active and define your join as an select so it can be cached. 让您的查询缓存处于活动状态,并将您的连接定义为选择,以便可以缓存它。 ensure enough memory etc. but that shouldn't be the bottleneck. 确保足够的内存等,但这不应该成为瓶颈。 well probably in your case it is because... only 400 records? 很可能在你的情况下,因为...只有400条记录? thats nothing... and already slow? 没什么......而且已经慢了? because the rest looks good. 因为其余的看起来不错。 what sort of hardware/configuration are you running? 你在运行什么样的硬件/配置?

but well, i think this is the wrong approach. 但是,我认为这是错误的做法。 mysql isnt designed for that. mysql不是为此而设计的。 in fact fulltext feature is limited to myisam. 实际上全文功能仅限于myisam。

you should consider using lucene/ solr using the dismax request handler. 你应该考虑使用dismax请求处理程序使用lucene / solr it should give you good results in about 50ms-100ms with an index of some hundret thousand documents. 它应该在大约50ms-100ms给你很好的结果,索引一些hundret千文件。 at some point you can shard it so the number of records is pratically unlimited. 在某些时候你可以对它进行分片,因此记录的数量是非常无限的。 plus you have better options and can achieve better results. 加上你有更好的选择,可以取得更好的结果。 for example do fuzzy matching or give more weight to newer documents or have tags more relevant than title, do post query analyzation, facetting, etc... 例如,做模糊匹配或给予较新文档更多权重或使标签与标题更相关,进行后查询分析,分面等...

You might also try to run OPTIMIZE TABLE questions 您也可以尝试运行OPTIMIZE TABLE questions

It helped speed up a similar query in a project I'm working on. 它帮助加快了我正在进行的项目中的类似查询。

See reference: https://dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html 参见参考: https//dev.mysql.com/doc/refman/5.7/en/fulltext-fine-tuning.html

Your formulation of the query works slowly for multiple reasons, but I am unsure of the details. 由于多种原因,您对查询的表述工作缓慢,但我不确定细节。 Please provide EXPLAIN FORMAT=JSON SELECT ... for further discussion. 请提供EXPLAIN FORMAT=JSON SELECT ...以供进一步讨论。

Meanwhile, let's rewrite the query in a way that should work faster. 同时,让我们以一种应该更快的方式重写查询。 (And it might get rid of a bug you have not yet encountered.) (它可能会摆脱你尚未遇到的错误。)

First, let's build an debug this. 首先,让我们构建一个调试。 It does the 3 FT searches in 3 separate queries, then combines ( UNION ) just the question_ids from each. 它在3个独立的查询中的3个FT搜索,然后组合( UNION只是 question_ids从每个。

    ( SELECT question_id,
         MATCH (... ) as relevance
         FROM questions
         WHERE MATCH (questiontext, ...) AGAINST ... )
    UNION ALL
    ( SELECT am.question_id,
         MATCH (... ) as relevance
         FROM answers AS a
         JOIN answer_mapping AS am ON am.answerid = a.id
         WHERE MATCH (a.answertext) AGAINST ... )
    UNION ALL
    ( SELECT tm.question_id,
         MATCH (... ) as relevance
         FROM tags AS t
         JOIN tagsmapping tm ON ...
         WHERE MATCH (t.tag) AGAINST ... )

Notice how each subquery is designed to start with the table with the FT index and end up with question_id . 注意每个子查询是如何设计为以带有FT索引的表开始 ,最后是question_id

Now, an intermediate query: 现在,一个中间查询:

SELECT question_id,
         MAX(relevance)  -- (this fixes the unseen bug)
    FROM ( that query ) AS q1
    GROUP BY question_id
    ORDER BY relevance DESC  -- optional; needed for `LIMIT`
    LIMIT 20          -- to limit the rows, do it at this stage

If that works out fast enough, and provides the "correct" question_ids , then we can proceed... 如果运行得足够快,并提供“正确”的question_ids ,那么我们可以继续......

Use that as a subquery to get to the rest of the data: 使用它作为子查询来获取其余数据:

SELECT .... -- the `questions` fields, using `q....`,
       ( SELECT login FROM users WHERE q.userid = id ) AS username
    FROM ( the intermediate query ) AS q2
    JOIN questions AS q
    questions q.spam < 10 
    ORDER BY q2.relevance

Yes, this is JOINing back to questions , but that turns out to be faster. 是的,这是JOINingquestions ,但原来要快。

Note that the GROUP BY is not neded here. 请注意, GROUP BY不在此处。 And, if the inner query had LIMIT , it won't be needed here. 并且,如果内部查询具有LIMIT ,则此处不需要它。

I apologize if I did not quite get everything right; 如果我没有把一切都搞定,我道歉; there were more transformations than I expected. 有比我预期更多的转变。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM