简体   繁体   English

使用ORDERBY时MySQL慢速JOIN查询

[英]MySQL Slow JOIN query when using ORDERBY

I have a problem with this query: 我有这个查询的问题:

SELECT a.*
FROM smartressort AS s
JOIN smartressort_to_ressort AS str
    ON s.id = str.smartressort_id
JOIN article_to_ressort AS atr
    ON str.ressort_id = atr.ressort_id
JOIN article AS a FORCE INDEX (source_created)
    ON atr.article_id = a.id    
WHERE
    s.id = 1
ORDER BY
    a.created_at DESC
LIMIT 25;

This one is realy slow, it some times takes 14 sec. 这个很慢,有时需要14秒。

EXPLAIN show this: EXPLAIN显示:

1   SIMPLE  s   const   PRIMARY PRIMARY 4   const   1   Using index; Using temporary; Using filesort
1   SIMPLE  str ref PRIMARY,ressort_id  PRIMARY 4   const   1   Using index
1   SIMPLE  atr ref PRIMARY,article_id  PRIMARY 4   com.nps.lvz-prod.str.ressort_id 1262    Using index
1   SIMPLE  a   ALL NULL    NULL    NULL    NULL    146677  Using where; Using join buffer (flat, BNL join)

so the last "all" type is realy bad. 所以最后一个“全部”类型真的很糟糕。 But i already tried to force using the index, with no luck. 但我已经试图强制使用索引,没有运气。

The Article Table looks like this: 文章表如下所示:

CREATE TABLE `article` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`node_id` varchar(255) NOT NULL DEFAULT '',
`object_id` varchar(255) DEFAULT NULL,
`headline_1` varchar(255) NOT NULL DEFAULT '',
`created_at` datetime(3) NOT NULL,
`updated_at` datetime(3) NOT NULL,
`teaser_text` longtext NOT NULL,
`content_text` longtext NOT NULL,
PRIMARY KEY (`id`),
KEY `article_nodeid` (`node_id`),
KEY `article_objectid` (`object_id`),
KEY `source_created` (`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=161116 DEFAULT CHARSET=utf8mb4 ROW_FORMAT=DYNAMIC;

When i remove the FORCE INDEX, the Explain gets better, but the query is still slow. 当我删除FORCE INDEX时,Explain变得更好,但查询仍然很慢。

Explain Without force index: 解释没有力量指数:

1   SIMPLE  s   const   PRIMARY PRIMARY 4   const   1   Using index; Using temporary; Using filesort
1   SIMPLE  str ref PRIMARY,ressort_id  PRIMARY 4   const   1   Using index
1   SIMPLE  atr ref PRIMARY,article_id  PRIMARY 4   com.nps.lvz-prod.str.ressort_id 1262    Using index
1   SIMPLE  a   eq_ref  PRIMARY PRIMARY 4   com.nps.lvz-prod.atr.article_id 1   

And for another smartressort id(3) it looks like this: 而对于另一个smartressort id(3),它看起来像这样:

1   SIMPLE  s   const   PRIMARY PRIMARY 4   const   1   Using index; Using temporary; Using filesort
1   SIMPLE  str ref PRIMARY,ressort_id  PRIMARY 4   const   13  Using index
1   SIMPLE  atr ref PRIMARY,article_id  PRIMARY 4   com.nps.lvz-prod.str.ressort_id 1262    Using index
1   SIMPLE  a   eq_ref  PRIMARY PRIMARY 4   com.nps.lvz-prod.atr.article_id 1   

Here we have 13 Ressorts for one Smartressort. 这里我们有一个Smartressort的13个Ressorts。 Rows: 1x1x13x1262x1 = 16.406 行:1x1x13x1262x1 = 16.406

1) What can i do to make this request faster? 1)如何更快地提出此请求?

2) What's wrong with the source_created index? 2) source_created索引有什么问题?

The SELECT * you have in your query is ugly, and this can often be an index killer. 查询中的SELECT *是丑陋的,这通常是一个索引杀手。 It can preclude the use of an index, because most indices you would define would not cover every column demanded by the SELECT * . 它可以排除索引的使用,因为您定义的大多数索引都不会覆盖SELECT *所需的每一列。 The approach of this answer is to index all other tables in your query, which would therefore incentivize MySQL to just do a single scan over the article table. 这个答案的方法是索引查询中的所有其他表,这将激励MySQL只对article表进行一次扫描。

CREATE INDEX idx1 ON article_to_ressort (article_id, ressort_id);
CREATE INDEX idx2 ON smartressort_to_ressort (ressort_id, smartressort_id);

These two indices should speed up the joining process. 这两个指数应加快加盟进程。 Note that I did not define an index for the smartressort table, assuming that its id column is already a primary key. 请注意,我没有为smartressort表定义索引,假设其id列已经是主键。 I would probably write your query starting with the article table, and joining outwards, but it should not really matter. 我可能会从article表开始编写您的查询,然后向外加入,但这应该不重要。

Also, forcing an index is mostly either a bad idea or not necessary. 此外,强制索引主要是一个坏主意或不必要。 The optimizer can usually figure out when it is best to use an index. 优化器通常可以确定何时最好使用索引。

SELECT many columns FROM tables ORDER BY something LIMIT few is a notorious performance antipattern; SELECT many columns FROM tables ORDER BY something LIMIT few是一个臭名昭着的性能反模式; it has to retrieve and order a whole mess of rows and columns, just to discard all but a few rows of the result set. 它必须检索并排序一堆乱七八糟的行和列,只是为了丢弃结果集中除了几行之外的所有行。

The trick is to figure out which values of article.id you need in your result set, then retrieve just those values. 诀窍是找出结果集中需要的article.id值,然后只检索这些值。 It's called a deferred join . 它被称为延迟连接

This should get you that set of id values. 这应该可以获得那组id值。 There's probably no need to join the smartressort table because smartressort_to_ressort contains the id values you need. 可能没有必要加入smartressort表,因为smartressort_to_ressort包含您需要的id值。

                 SELECT a.id
                   FROM article a
                   JOIN article_to_ressort atr ON a.id = atr.article_id
                   JOIN smartressort_to_ressort str ON atr.ressort_id = str.ressort_id
                  WHERE str.smartressort_id = 1
                  ORDER BY a.created_at DESC
                  LIMIT 25

Then you can use this as a subquery to get the rows you need. 然后,您可以将其用作子查询来获取所需的行。

SELECT a.*
  FROM article a
 WHERE a.id IN (
                 SELECT a.id
                   FROM article a
                   JOIN article_to_ressort atr ON a.id = atr.article_id
                   JOIN smartressort_to_ressort str ON atr.ressort_id = str.ressort_id
                  WHERE str.smartressort_id = 1
                  ORDER BY a.created_at DESC
                  LIMIT 25
               )
 ORDER BY a.created_at DESC

The second ORDER BY makes sure the rows from article are in a predictable order. 第二个ORDER BY确保文章中的行具有可预测的顺序。 Your index optimization work, then, need only apply to the subquery. 那么,您的索引优化工作只需要应用于子查询。

In addition to @TimBiegelsen 's great answer, I would recommend to modify your source_created index: 除了@TimBiegelsen的好答案,我建议修改你的source_created索引:

...
KEY `source_created` (`id`, `created_at`)

The gain would be that MySQL could use it for sorting, and wouldn't need to fetch all 16406 rows. 获得的是MySQL可以使用它进行排序,并且不需要获取所有16406行。 It may or may not help, but worth to try (perhaps with explicite declaration to use it) 它可能会或可能没有帮助,但值得尝试(可能使用明确的声明来使用它)

To start with: You can remove the smartressort table from your query, as it doesn't add anything to it. 首先:您可以从查询中删除smartressort表,因为它不会向其添加任何内容。

The following is your query rewritten. 以下是您重写的查询。 We want all ressorts for smart ressort #1 and then all articles for these ressorts. 我们想要智能ressort#1的所有ressorts,然后是所有这些ressorts的文章。 Of these we show the newest 25. 其中我们展示了最新的25个。

SELECT *
FROM article
WHERE id IN
(
  SELECT article_id
  FROM article_to_ressort 
  WHERE ressort_id IN
  (
    SELECT ressort_id
    FROM smartressort_to_ressort
    WHERE smartressort_id = 1
  )
)
ORDER BY created_at DESC
LIMIT 25;

Now which indexes would be needed to help the DBMS with this? 现在需要哪些索引来帮助DBMS? Start with the inner table ( smartressort_to_ressort ). 从内部表开始( smartressort_to_ressort )。 We access all records with a given smartressort_id and we want to get the associated ressort_id . 我们使用给定的smartressort_id访问所有记录,并且我们希望获得关联的ressort_id So the index should contain these two columns in this order. 因此索引应按此顺序包含这两列。 Same for article_to_ressort and its ressort_id and article_id . 对于article_to_ressort及其ressort_idarticle_id At last we want to select the articles by the found article IDs and order by their created_at . 最后,我们希望通过找到的文章ID选择文章,并按其created_at排序。

CREATE INDEX idx1 ON smartressort_to_ressort (smartressort_id, ressort_id);
CREATE INDEX idx2 ON article_to_ressort (ressort_id, article_id);
CREATE INDEX idx3 ON article (id, created_at);

Anyway, these indexes are just an offer to the DBMS. 无论如何,这些索引只是对DBMS的提议。 It may decide against them. 它可能会决定反对他们。 This is especially true for the index on the article table. 对于article表上的索引尤其如此。 How many rows does the DBMS expect to access for one smartressort_id , ie how many rows may be in the IN clause? DBMS期望为一个smartressort_id访问多少行,即IN子句中可能有多少行? If the DBMS thinks that this might well be about 10% of all article IDs, it may already decide to rather read the table sequentially than muddle it's way through the index for so many rows. 如果DBMS认为这可能大约是所有文章ID的10%,那么它可能已经决定顺序读取表,而不是混淆它通过索引这么多行。

So for me the solution was this: 所以对我来说解决方案是这样的:

SELECT a.*
FROM article as a  USE INDEX (source_created)
where a.id in (
             SELECT atr.article_id
               from smartressort_to_ressort str 
               JOIN article_to_ressort atr  ON atr.ressort_id = str.ressort_id
              WHERE str.smartressort_id = 1
) 
ORDER BY a.created_at DESC
LIMIT 25;

This only needs ~35ms. 这只需要~35ms。 Explain looks like this: 说明看起来像这样:

1   PRIMARY a   index   NULL    source_created  7   NULL    1   
1   PRIMARY <subquery2> eq_ref  distinct_key    distinct_key    4   func    1
2   MATERIALIZED    str ref PRIMARY,ressort_id,idx1 PRIMARY 4   const   1   Using index
2   MATERIALIZED    atr ref PRIMARY,article_id,idx2 PRIMARY 4   com.nps.lvz-prod.str.ressort_id 1262    Using index

Even so, this query Explain looks better for me, but i don't know why exactly: 即便如此,这个查询Explain看起来对我来说更好,但我不知道为什么:

explain SELECT a.*, NOW()
FROM article as a  USE INDEX (source_created)
where a.id in (SELECT atr.article_id
    FROM smartressort AS s
    JOIN smartressort_to_ressort AS str
    ON s.id = str.smartressort_id
    JOIN article_to_ressort AS atr
    ON str.ressort_id = atr.ressort_id
    WHERE s.id = 1
) 
ORDER BY a.created_at DESC
LIMIT 25;

Output: 输出:

1   PRIMARY s   const   PRIMARY PRIMARY 4   const   1   Using index
1   PRIMARY a   index   NULL    source_created  7   NULL    25  
1   PRIMARY str ref PRIMARY,ressort_id,idx1 PRIMARY 4   const   1   Using index
1   PRIMARY atr eq_ref  PRIMARY,article_id,idx2 PRIMARY 8   com.nps.lvz-prod.str.ressort_id,com.nps.lvz-prod.a.id   1   Using index; FirstMatch(a)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM