[英]Optimizing fulltext search across multiple tables
I want to search the requested term ($q) in my content table on the title & the keywords but also for the models, which are in another table and linked by a table in between. 我想在内容表中的标题和关键字上搜索请求的术语($ q),还希望在模型中搜索模型,这些模型在另一个表中并由它们之间的一个表链接。 Also, I need to get the number of views in another table.
另外,我需要获取另一个表中的视图数。
This is the query that I have been working on so far, the result is fine but it's way too slow (0.6s on average when I run it in PhpMyAdmin... We have millions of visitors per month) 到目前为止,这是我一直在处理的查询,结果很好,但是速度太慢(当我在PhpMyAdmin中运行该查询时,平均速度为0.6s ...我们每月有数百万的访问者)
SELECT DISTINCT SQL_CALC_FOUND_ROWS
c.*,
cv.views,
(MATCH (c.title) AGAINST ('{$q}') * 3) Relevance1,
MATCH (c.keywords) AGAINST ('{$q}') Relevance2,
(MATCH (a.`name`) AGAINST ('{$q}') * 2) Relevance3
FROM
content AS c
LEFT JOIN
content_actors AS ca ON ca.content = c.record_num
LEFT JOIN
actors AS a ON a.record_num = cm.actor
LEFT JOIN
content_views AS cv ON cv.content = c.record_num
WHERE
c.enabled = 1
GROUP BY c.title, c.length
HAVING (Relevance1 + Relevance2 + Relevance3) > 0
ORDER BY (Relevance1 + Relevance2 + Relevance3) DESC
The tables architecture looks like this: 表架构如下所示:
content
record_num title keywords
1 Video1 Comedy, Action, Supercool
2 Video2 Comet
content_actors
content model
1 1
1 2
2 1
actors
record_num name
1 Jennifer Lopez
2 Bruce Willis
content_views
content views
1 160
2 312
Here are the indexes I found by doing SHOW INDEX FROM tablename: 这是我通过执行SHOW INDEX FROM tablename发现的索引:
Table Column_Name Seq_in_index Key_name Index_type
---------------------------------------------------------------------------
content record_num 1 PRIMARY BTREE
content keywords 1 keywords FULLTEXT
content keywords 2 title FULLTEXT
content title 1 title FULLTEXT
content description 1 description FULLTEXT
content keywords 1 keywords_2 FULLTEXT
content_actors content 1 content BTREE
content_actors actor 2 content BTREE
content_actor actor 1 actor BTREE
actors record_num 1 PRIMARY BTREE
actors name 1 name BTREE
actors name 1 name_2 FULLTEXT
content_views content 1 PRIMARY BTREE
content_views views 1 views BTREE
Here is the EXPLAIN of the query: 这是查询的说明:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY ROWS EXTRA
1 SIMPLE c ref enabled_2, enabled enabled 29210 Using where; Using temporary; Using filesort
1 SIMPLE ca ref content content 1 Using index
1 SIMPLE a eq_ref PRIMARY PRIMARY 1
1 SIMPLE cv eq_ref PRIMARY PRIMARY 1
I am using the GROUP BY to avoid duplicate content, but this group by alone seems to double the time required to process the query. 我正在使用GROUP BY来避免重复的内容,但是单独使用group by似乎会使处理查询所需的时间增加一倍。
EDIT Well after playing with the query a bit, I realized that if I remove the GROUP BY I get duplicates, if I let the GROUP BY there, it doesn't take the proper Relevance3 value (without the GROUP BY, one is returning a value for Relevance3 while the other is not...) 编辑好了一点儿查询之后,我意识到如果删除GROUP BY会得到重复,如果让GROUP BY在那儿,它不会采用正确的Relevance3值(没有GROUP BY的话,它会返回一个相关性3的值,而另一个不是...)
Add the MATCHes
(OR'd together) to the WHERE
-- this will cut back significantly on the number of rows to handle in SQL_CALC_FOUND_ROWS
and eliminate the need for HAVING...
. 将
MATCHes
(或一起)添加到WHERE
-这将大大减少SQL_CALC_FOUND_ROWS
要处理的行数,并消除了HAVING...
的需要。
Instead of 代替
cv.views,
...
LEFT JOIN content_views AS cv ON cv.content = c.record_num
do 做
( SELECT views FROM content_views ON content = c.record_num ) AS views,
Edit 编辑
The LEFT
and GROUP BY
are needed because the actors
are optional and there could be multiple multiple actors
. 需要
LEFT
和GROUP BY
是因为actors
是可选的,并且可以有多个actors
。 Since you don't need the actor name at all, you can probably get rid of it by doing 由于您根本不需要演员名称,因此您可以通过执行此操作来摆脱它
WHERE ... AND ( EXISTS SELECT *
FROM content_actors
JOIN actors AS a ON ...
WHERE MATCH (a.`name`) AGAINST ('{$q}')
AND ca...
)
but that does not let you include the relevance in the ORDER BY
. 但这不能让您在
ORDER BY
包括相关性。
So, you need to build a subquery with a UNION DISTINCT
. 因此,您需要使用
UNION DISTINCT
构建子查询。 There will be 2 SELECTs
: 将有2个
SELECTs
:
SELECT #1: 选择#1:
SELECT c.id,
3 * MATCH(c.title) AGAINST ('{$q}')
+ MATCH(c.keywords) AGAINST ('{$q}') AS relevance
FROM Content AS c
WHERE MATCH(c.title, c.keywords) AGAINST ('{$q}')
(and have FULLTEXT(title, keywords)) This will efficiently fetch the ids for
content` rows that are useful. (并具有
FULLTEXT(title, keywords)) This will efficiently fetch the ids for
有用的内容行FULLTEXT(title, keywords)) This will efficiently fetch the ids for
。
SELECT #2: 选择#2:
SELECT c.id,
2*MAX(MATCH(a.actor) AGAINST ('{$q}') AS actor_rel) AS relevance
FROM content AS c
JOIN content_actors ca ON ca.content = c.record_num
JOIN actors a ON a.record_num = ca.actor
WHERE MATCH(a.actor) AGAINST ('{$q}')
GROUP BY c.id;
Be sure to have content_actors: INDEX(actor)
and content: INDEX(record_num)
. 确保具有
content_actors: INDEX(actor)
和content: INDEX(record_num)
。 This SELECT
will efficiently start with actors
and work back to content
. 此
SELECT
将有效地从actors
开始,然后返回到content
。 And note that it does something different than your code when two actors MATCH
; 并请注意,当两个参与者进行
MATCH
时,它执行的操作与您的代码不同; hopefully my MAX
is a better solution. 希望我的
MAX
是更好的解决方案。
Now, let's put things together... 现在,让我们放在一起...
SELECT #3: 选择#3:
SELECT id, SUM(rel) AS relevance
FROM ( ... select #1 ... )
UNION ALL
( ... select #2 ... )
GROUP BY id
But that is not quite all... 但这还不是全部...
SELECT #4: 选择#4:
SELECT c.*,
( ... views ... ) AS views
FROM ( ... select #3 ... ) AS u
JOIN content c ON c.id = u.id
I suggest you run each of these steps by hand to validate them, gradually putting all the pieces together. 我建议您手动执行每个步骤以验证它们,然后逐步将所有部分放在一起。 Yes, it is complex, but it should be quite fast.
是的,它很复杂,但是应该很快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.