简体   繁体   English

优化跨多个表的全文搜索

[英]Optimizing fulltext search across multiple tables

I want to search the requested term ($q) in my content table on the title & the keywords but also for the models, which are in another table and linked by a table in between. 我想在内容表中的标题和关键字上搜索请求的术语($ q),还希望在模型中搜索模型,这些模型在另一个表中并由它们之间的一个表链接。 Also, I need to get the number of views in another table. 另外,我需要获取另一个表中的视图数。

This is the query that I have been working on so far, the result is fine but it's way too slow (0.6s on average when I run it in PhpMyAdmin... We have millions of visitors per month) 到目前为止,这是我一直在处理的查询,结果很好,但是速度太慢(当我在PhpMyAdmin中运行该查询时,平均速度为0.6s ...我们每月有数百万的访问者)

SELECT DISTINCT SQL_CALC_FOUND_ROWS
    c.*,
    cv.views,
    (MATCH (c.title) AGAINST ('{$q}') * 3) Relevance1,
    MATCH (c.keywords) AGAINST ('{$q}') Relevance2,
    (MATCH (a.`name`) AGAINST ('{$q}') * 2) Relevance3
FROM
    content AS c
LEFT JOIN
    content_actors AS ca ON ca.content = c.record_num
LEFT JOIN
    actors AS a ON a.record_num = cm.actor
LEFT JOIN
    content_views AS cv ON cv.content = c.record_num
WHERE
    c.enabled = 1
GROUP BY c.title, c.length
HAVING (Relevance1 + Relevance2 + Relevance3) > 0
ORDER BY (Relevance1 + Relevance2 + Relevance3) DESC

The tables architecture looks like this: 表架构如下所示:

content
record_num     title     keywords
1              Video1    Comedy, Action, Supercool
2              Video2    Comet

content_actors
content     model
1           1
1           2
2           1

actors
record_num     name
1              Jennifer Lopez
2              Bruce Willis

content_views
content     views
1           160
2           312

Here are the indexes I found by doing SHOW INDEX FROM tablename: 这是我通过执行SHOW INDEX FROM tablename发现的索引:

Table              Column_Name     Seq_in_index     Key_name     Index_type
---------------------------------------------------------------------------
content            record_num      1                PRIMARY      BTREE
content            keywords        1                keywords     FULLTEXT
content            keywords        2                title        FULLTEXT
content            title           1                title        FULLTEXT
content            description     1                description  FULLTEXT
content            keywords        1                keywords_2   FULLTEXT

content_actors     content         1                content      BTREE
content_actors     actor           2                content      BTREE
content_actor      actor           1                actor        BTREE

actors             record_num      1                PRIMARY      BTREE
actors             name            1                name         BTREE
actors             name            1                name_2       FULLTEXT

content_views      content         1                PRIMARY      BTREE
content_views      views           1                views        BTREE

Here is the EXPLAIN of the query: 这是查询的说明:

ID     SELECT_TYPE     TABLE     TYPE       POSSIBLE_KEYS          KEY         ROWS      EXTRA
1      SIMPLE          c         ref        enabled_2, enabled     enabled     29210     Using where; Using temporary; Using filesort
1      SIMPLE          ca        ref        content                content     1         Using index
1      SIMPLE          a         eq_ref     PRIMARY                PRIMARY     1
1      SIMPLE          cv        eq_ref     PRIMARY                PRIMARY     1

I am using the GROUP BY to avoid duplicate content, but this group by alone seems to double the time required to process the query. 我正在使用GROUP BY来避免重复的内容,但是单独使用group by似乎会使处理查询所需的时间增加一倍。

EDIT Well after playing with the query a bit, I realized that if I remove the GROUP BY I get duplicates, if I let the GROUP BY there, it doesn't take the proper Relevance3 value (without the GROUP BY, one is returning a value for Relevance3 while the other is not...) 编辑好了一点儿查询之后,我意识到如果删除GROUP BY会得到重复,如果让GROUP BY在那儿,它不会采用正确的Relevance3值(没有GROUP BY的话,它会返回一个相关性3的值,而另一个不是...)

Add the MATCHes (OR'd together) to the WHERE -- this will cut back significantly on the number of rows to handle in SQL_CALC_FOUND_ROWS and eliminate the need for HAVING... . MATCHes (或一起)添加到WHERE -这将大大减少SQL_CALC_FOUND_ROWS要处理的行数,并消除了HAVING...的需要。

Instead of 代替

cv.views,
...
LEFT JOIN  content_views AS cv ON cv.content = c.record_num

do

( SELECT views FROM content_views ON content = c.record_num ) AS views,

Edit 编辑

The LEFT and GROUP BY are needed because the actors are optional and there could be multiple multiple actors . 需要LEFTGROUP BY是因为actors是可选的,并且可以有多个actors Since you don't need the actor name at all, you can probably get rid of it by doing 由于您根本不需要演员名称,因此您可以通过执行此操作来摆脱它

WHERE ... AND ( EXISTS SELECT * 
                    FROM content_actors
                    JOIN actors AS a ON ...
                    WHERE MATCH (a.`name`) AGAINST ('{$q}')
                      AND ca...
              )

but that does not let you include the relevance in the ORDER BY . 但这不能让您在ORDER BY包括相关性。

So, you need to build a subquery with a UNION DISTINCT . 因此,您需要使用UNION DISTINCT构建子查询。 There will be 2 SELECTs : 将有2个SELECTs

SELECT #1: 选择#1:

SELECT c.id,
       3 * MATCH(c.title) AGAINST ('{$q}')
       +   MATCH(c.keywords) AGAINST ('{$q}')  AS relevance
    FROM Content AS c
    WHERE MATCH(c.title, c.keywords) AGAINST ('{$q}')

(and have FULLTEXT(title, keywords)) This will efficiently fetch the ids for content` rows that are useful. (并具有FULLTEXT(title, keywords)) This will efficiently fetch the ids for有用的内容行FULLTEXT(title, keywords)) This will efficiently fetch the ids for

SELECT #2: 选择#2:

SELECT c.id,
       2*MAX(MATCH(a.actor) AGAINST ('{$q}') AS actor_rel) AS relevance
    FROM content AS c
    JOIN content_actors ca  ON ca.content = c.record_num
    JOIN actors a  ON a.record_num = ca.actor
    WHERE MATCH(a.actor) AGAINST ('{$q}')
    GROUP BY c.id;

Be sure to have content_actors: INDEX(actor) and content: INDEX(record_num) . 确保具有content_actors: INDEX(actor)content: INDEX(record_num) This SELECT will efficiently start with actors and work back to content . SELECT将有效地从actors开始,然后返回到content And note that it does something different than your code when two actors MATCH ; 并请注意,当两个参与者进行MATCH时,它执行的操作与您的代码不同; hopefully my MAX is a better solution. 希望我的MAX是更好的解决方案。

Now, let's put things together... 现在,让我们放在一起...

SELECT #3: 选择#3:

SELECT id, SUM(rel) AS relevance
    FROM ( ... select #1 ... )
         UNION ALL
         ( ... select #2 ... )
    GROUP BY id

But that is not quite all... 但这还不是全部...

SELECT #4: 选择#4:

SELECT c.*,
       ( ... views ... ) AS views
    FROM ( ... select #3 ... ) AS u
    JOIN content c  ON c.id = u.id

I suggest you run each of these steps by hand to validate them, gradually putting all the pieces together. 我建议您手动执行每个步骤以验证它们,然后逐步将所有部分放在一起。 Yes, it is complex, but it should be quite fast. 是的,它很复杂,但是应该很快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM