I'm really trying to get this right but I can't figure out what's going on. The problem is that MySQL goes in to filesort when order by an indexed column in the join.
Three tables:
CREATE TABLE IF NOT EXISTS articles (
pk int unsigned NOT NULL AUTO_INCREMENT,
id varchar(254) NOT NULL,
title VARCHAR(128) DEFAULT NULL,
text VARCHAR(4096) DEFAULT NULL,
publicationTime DATETIME DEFAULT NULL,
KEY publicationTime (publicationTime),
PRIMARY KEY (pk)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS articles_channels (
pk int unsigned NOT NULL AUTO_INCREMENT,
article int unsigned NOT NULL,
channel int unsigned NOT NULL,
UNIQUE KEY ac (article,channel),
PRIMARY KEY(pk)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS channels (
pk INT UNSIGNED NOT NULL AUTO_INCREMENT,
id VARCHAR(64) NOT NULL,
UNIQUE KEY id (id),
PRIMARY KEY (pk)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Articles are attached to zero or more channels via the articles_channels table.
The aim of the query is the find articles included in a set of channels (S1) and exluded in another set (S2) and ordered by publicationTime desc.
Ex: Give me the first 10 articles (by publicationTime) belonging to channel 'c1' and that are not in ('c2','c3').
The query:
SELECT
SUM(case WHEN c.id IN ('c1') THEN 1 ELSE 0 END) as cin,
SUM(case WHEN c.id IN ('c2','c3') THEN 1 ELSE 0 END) as cout,
a.publicationTime
FROM articles a
LEFT JOIN articles_channels ac ON ac.article=a.pk
LEFT JOIN channels c ON ac.channel=c.pk
GROUP BY a.pk HAVING (cin>=1) AND cout=0
ORDER BY a.publicationTime DESC
LIMIT 1,10;
The query gives the following explain:
When I change the ORDER BY a.publicationTime to ORDER BY a.pk the 'Using temporary, Using filesort' dissapears. I just can't get it. Very thankful for any help.
BR
Niclas
If it might help anyone: The (half) solution to the problem was to split the query in two queries to avoid fullscan going to disk. It was much cheaper to select only the primary keys in the full scan to allow it to be executed in memory. The executing a second query from the main table WHERE pk IN (pk1,pk2,pk3,..). The problem still exists but in my case it could all be ran in memory which greatly improved performance.
If you want only one row per article, here is an alternative approach that might use the index:
SELECT a.publicationtime, cin, cout
FROM articles a join
(select ac.article, sum(c.id in ('c1')) as cin, sum(c.id in ('c2', 'c3')) as cout
from articles_channels ac
channels c
on ac.channel = c.pk
group by ac.article
) ac
on ac.article = a.pk and cin >= 1 and cout = 0
ORDER BY a.publicationTime DESC
LIMIT 1, 10;
Note that the left join
is unnecessary, because the of the conditions on cin
and cout
.
If this doesn't work, then a version using correlated subqueries would be very likely to use the index.
EDIT:
The last attempt is:
SELECT a.publicationtime,
(select sum(c.id in ('c1')) as cin
from articles_channels ac
channels c
on ac.channel = c.pk
where ac.article = a.pk
) as cin,
(select sum(c.id in ('c2', 'c3')) as cout
from articles_channels ac
channels c
on ac.channel = c.pk
where ac.article = a.pk
) as cout
FROM articles a
HAVING ac.article = a.pk and cin >= 1 and cout = 0
ORDER BY a.publicationTime DESC
LIMIT 1, 10;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.