简体   繁体   中英

MySql Filesort and temporary when order by indexed column and not when ordering on pk

I'm really trying to get this right but I can't figure out what's going on. The problem is that MySQL goes in to filesort when order by an indexed column in the join.

Three tables:

CREATE TABLE IF NOT EXISTS articles (
 pk int unsigned NOT NULL AUTO_INCREMENT,
 id varchar(254) NOT NULL,
 title VARCHAR(128) DEFAULT NULL,
 text VARCHAR(4096) DEFAULT NULL,
 publicationTime DATETIME DEFAULT NULL,
 KEY publicationTime (publicationTime),
 PRIMARY KEY (pk)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS articles_channels (
 pk int unsigned NOT NULL AUTO_INCREMENT,
 article int unsigned NOT NULL,
 channel int unsigned NOT NULL,
 UNIQUE KEY ac (article,channel),
 PRIMARY KEY(pk)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


CREATE TABLE IF NOT EXISTS channels (
 pk INT UNSIGNED NOT NULL AUTO_INCREMENT,
 id VARCHAR(64) NOT NULL,
 UNIQUE KEY id (id),
 PRIMARY KEY (pk)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Articles are attached to zero or more channels via the articles_channels table.

The aim of the query is the find articles included in a set of channels (S1) and exluded in another set (S2) and ordered by publicationTime desc.

Ex: Give me the first 10 articles (by publicationTime) belonging to channel 'c1' and that are not in ('c2','c3').

The query:

SELECT 
 SUM(case WHEN c.id IN ('c1') THEN 1 ELSE 0 END) as cin,
 SUM(case   WHEN c.id IN ('c2','c3') THEN 1 ELSE 0 END) as cout,
 a.publicationTime 
FROM articles a 
 LEFT JOIN articles_channels ac ON ac.article=a.pk 
 LEFT JOIN channels c ON ac.channel=c.pk 
 GROUP BY a.pk HAVING (cin>=1) AND cout=0 
 ORDER BY a.publicationTime DESC 
 LIMIT 1,10;

The query gives the following explain:

在此处输入图片说明

When I change the ORDER BY a.publicationTime to ORDER BY a.pk the 'Using temporary, Using filesort' dissapears. I just can't get it. Very thankful for any help.

BR

Niclas

If it might help anyone: The (half) solution to the problem was to split the query in two queries to avoid fullscan going to disk. It was much cheaper to select only the primary keys in the full scan to allow it to be executed in memory. The executing a second query from the main table WHERE pk IN (pk1,pk2,pk3,..). The problem still exists but in my case it could all be ran in memory which greatly improved performance.

If you want only one row per article, here is an alternative approach that might use the index:

SELECT a.publicationtime, cin, cout
FROM articles a join
     (select ac.article, sum(c.id in ('c1')) as cin, sum(c.id in ('c2', 'c3')) as cout
      from articles_channels ac
           channels c
           on ac.channel = c.pk
      group by ac.article
     ) ac
     on ac.article = a.pk and cin >= 1 and cout = 0
ORDER BY a.publicationTime DESC 
LIMIT 1, 10;

Note that the left join is unnecessary, because the of the conditions on cin and cout .

If this doesn't work, then a version using correlated subqueries would be very likely to use the index.

EDIT:

The last attempt is:

SELECT a.publicationtime,
       (select sum(c.id in ('c1')) as cin
        from articles_channels ac
             channels c
             on ac.channel = c.pk
        where ac.article = a.pk
       ) as cin,
       (select sum(c.id in ('c2', 'c3')) as cout
        from articles_channels ac
             channels c
             on ac.channel = c.pk
        where ac.article = a.pk
       ) as cout
FROM articles a 
HAVING ac.article = a.pk and cin >= 1 and cout = 0
ORDER BY a.publicationTime DESC 
LIMIT 1, 10;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM