[英]MySQL: optimizing a query with three joins
First thing's first: what I am doing works perfectly fine. 第一件事是第一件事:我所做的一切都很好。 I'm just seeing if there is any room for improvements, and if how I'm doing things is standard and/or using good practices.
我只是在查看是否有任何改进的余地,并且我的工作方式是否是标准的和/或使用良好的做法。
These are the tables in question: 这些是有问题的表:
item
topic
item_topic
item_like_audit
. item_like_audit
。 This is my use case: 这是我的用例:
topic
's that can contain many item
's. topic
可以包含许多item
。 item
can have N amount of likes on them. item
可以包含N个赞。 item_like_audit
table, such that is can be queried at a later time for ranking purposes. item_like_audit
表中,以便稍后可以查询以进行排名。 This is what the query is trying to achieve: 这是查询要达到的目的:
Can the following query or underlying schema be improved in any way (for performance or memory gains)? 可以以任何方式(为了提高性能或增加内存)改善以下查询或基础架构吗?
Query: 查询:
SELECT DISTINCT item.* FROM item
/* Match items under this specific topic */
JOIN topic
ON topic.slug = ?
AND topic.deleted_at IS NULL
JOIN item_topic
ON item_topic.item_id = item.id
AND item_topic.topic_id = topic.id
AND item_topic.deleted_at IS NULL
/* Match items that have had "like" activity in the past 7 days */
JOIN item_like_audit
ON item_like_audit.item_id = item.id
AND item_like_audit.created_at <= (CURRENT_DATE + INTERVAL 7 DAY)
WHERE item.deleted_at IS NULL
/* Order by highest like count to lowest */
ORDER BY item.like_count DESC
/* Pagination */
LIMIT ? OFFSET ?
Schema: 架构:
CREATE TABLE item (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) NOT NULL UNIQUE,
tagline VARCHAR(255) NOT NULL,
description VARCHAR(1000) NOT NULL,
price FLOAT NOT NULL,
like_count INT(10) NOT NULL DEFAULT 0,
images VARCHAR(1000) NOT NULL,
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (id)
);
CREATE TABLE item_like_audit (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
item_id INT(10) UNSIGNED NOT NULL,
user_id INT(10) UNSIGNED NOT NULL,
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
KEY `item_like_audit_created_at_index` (`created_at`)
);
CREATE TABLE topic (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) NOT NULL UNIQUE,
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (id)
);
CREATE TABLE item_topic (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
item_id INT(10) NOT NULL,
topic_id INT(10) NOT NULL,
created_at TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (id)
);
Since you are only returning Item records, you could try this for possible improved performance: 由于仅返回项目记录,因此可以尝试执行以下操作以提高性能:
select Item.*
from Item
where Item.deleted_at is null
and exists (select 1 from item_topic
where item_topic.item_id = item.id
and itme_topic.deleted_at is null
and exists (select 1 from topic
where topic.id = item_topic.item_id
and topic.deleted_at is null
and topic.slug = ?))
and exists (select 1 from item_like_audit
where item_like_audit.item_id = item.id
and item_liek_audit.created_at >= (current_date - interval 7 day))
order by Item.like_count desc
This can potentially improve performance since: 这可能会提高性能,因为:
DISTINCT
operator DISTINCT
运算符 Assuming item_topic(item_id,topic_id)
is unique, we could do away with the "Using filesort" operation by getting rid of the DISTINCT
keyword, and rewriting the check of item_like_audit
as an EXISTS correlated subquery instead of a JOIN operation. 假设
item_topic(item_id,topic_id)
是唯一的,我们可以通过摆脱DISTINCT
关键字来取消“使用文件排序”操作,并将item_like_audit
的检查重写为EXISTS相关子查询而不是JOIN操作。
We'd have a guarantee of the uniqueness if we had 如果我们有,我们将保证唯一性
CREATE UNIQUE INDEX item_topic_UX1 ON item_topic (topic_id, item_id);
We already have guarantees of uniqueness for topic(slug)
, topic(id)
, item(id)
, ... 我们已经保证
topic(slug)
, topic(id)
, item(id)
,...的唯一性
SELECT item.*
FROM item
/* Match items under this specific topic */
JOIN item_topic
ON item_topic.item_id = item.id
AND item_topic.deleted_at IS NULL
JOIN topic
ON topic.id = item_topic.topic_id
AND topic.slug = ?
AND topic.deleted_at IS NULL
WHERE item.deleted_at IS NULL
/* Match items that have had "like" activity in the past 7 days */
AND EXISTS ( SELECT 1
FROM item_like_audit
WHERE item_like_audit.item_id = item.id
AND item_like_audit.created_at >= DATE(NOW()) + INTERVAL -7 DAY
)
/* Order by highest like count to lowest */
ORDER BY item.like_count DESC
For improved performance of the correlated subquery, we could create a covering index 为了提高相关子查询的性能,我们可以创建覆盖索引
CREATE INDEX item_like_audit_IX1 ON item_like_audit (item_id, created_at)
We expect the unique index we created earlier will be used for the join operation, so that should also improve performance. 我们希望我们之前创建的唯一索引将用于联接操作,因此也应提高性能。 We could get a covering index if we included
deleted_at
column 如果我们包含
deleted_at
列,我们可以获得覆盖指数
CREATE INDEX item_topic_IX2 ON item_topic (topic_id, item_id, deleted_at)
That is redundant with the unique index we created earlier, if we still want to guarantee uniqueness, flip the order of the columns around... 这与我们之前创建的唯一索引是多余的,如果我们仍然要保证唯一性,请翻转列的顺序...
DROP INDEX item_topic_UX1 ON item_topic ;
CREATE UNIQUE INDEX item_topic_UX1 ON item_topic (item_id,topic_id);
If we don't have guaranteed uniqueness, then I would favor adding a GROUP BY item.id
clause over a DISTINCT
keyword. 如果我们不能保证唯一性,那么我宁愿在
DISTINCT
关键字上添加GROUP BY item.id
子句。
Use EXPLAIN
to see the execution plan, and verify that appropriate indexes are being used. 使用
EXPLAIN
查看执行计划,并验证是否正在使用适当的索引。
If we can't guarantee uniqueness of (item_id,topic_id)
from item_topic
, and the overhead of the "Using filesort" operation for the GROUP BY
operation is still too high, 如果我们不能保证唯一性
(item_id,topic_id)
从item_topic
,并为“使用文件排序”的运作开销GROUP BY
操作仍然过高,
We could try checking the "matching topic" condition using an EXISTS. 我们可以尝试使用EXISTS检查“匹配主题”条件。 (But I don't hold out much hope that this will be any faster.)
(但我并不希望这会更快。)
SELECT item.*
FROM item
WHERE item.deleted_at IS NULL
AND EXISTS ( SELECT 1
FROM topic
JOIN item_topic
ON item_topic.item_id = item.id
AND item_topic.topic_id = topic.id
AND item_topic.deleted_at IS NULL
JOIN item_like_audit
ON item_like_audit = item.id
AND item_like_audit.created_at >= DATE(NOW()) + INTERVAL -7 DAY
WHERE topic.slug = ?
AND topic.deleted_at IS NULL
)
ORDER BY item.like_count DESC
We are going to need to have suitable indexes available for performance of the correlated subquery. 我们将需要具有合适的索引以用于相关子查询的性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.