This query is taking over a minute to complete:
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5
Every keyword has an ID associated with it (keyword_id column). And that ID is used to look up the actual keyword from the keyword table.
movie_keyword has 2.8 million rows
keyword has 127,000
However to return just the most used keyword_id's takes only 1 second:
SELECT keyword_id, count(*)
FROM movie_keyword
GROUP BY keyword_id
ORDER BY count(*) DESC
LIMIT 5
Is there a more efficient way of doing this?
Output with EXPLAIN:
1 SIMPLE keyword ALL PRIMARY NULL NULL NULL 125405 Using temporary; Using filesort
1 SIMPLE movie_keyword ref idx_keywordid idx_keywordid 4 imdb.keyword.id 28 Using index
Structure:
CREATE TABLE `movie_keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_mid` (`movie_id`),
KEY `idx_keywordid` (`keyword_id`),
KEY `keyword_ix` (`keyword_id`),
CONSTRAINT `movie_keyword_keyword_id_exists` FOREIGN KEY (`keyword_id`) REFERENCES `keyword` (`id`),
CONSTRAINT `movie_keyword_movie_id_exists` FOREIGN KEY (`movie_id`) REFERENCES `title` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4256379 DEFAULT CHARSET=latin1;
CREATE TABLE `keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword` text NOT NULL,
`phonetic_code` varchar(5) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_keyword` (`keyword`(5)),
KEY `idx_pcode` (`phonetic_code`),
KEY `keyword_ix` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=127044 DEFAULT CHARSET=latin1;
Untested but should work and be significantly faster in my opinion, not very sure if you're allowed to use limit in a subquery in mysql though, but there are other ways around that.
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
WHERE movie_keyword.keyword_id IN (
SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5
)
GROUP BY keyword
ORDER BY count(*) DESC;
This should be faster because you don't join all the 2.8 million entries in movie_keyword with keyword, just the ones that actually match, which I'm guessing are significantly less.
EDIT since mysql doesn't support limit inside a subquery you have to run
SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5;
first and after fetching the results run the second query
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
WHERE movie_keyword.keyword_id IN (RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS)
GROUP BY keyword
ORDER BY count(*) DESC;
replace RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS
with the proper values programatically from whatever language you're using
The query seems fine but I think the structure is not, try to give index on columns
keyword.id
try,
CREATE INDEX keyword_ix ON keyword (id);
or
ALTER TABLE keyword ADD INDEX keyword_ix (id);
much better if you can post the structures of your tables: keyword
and Movie_keyword
. Which of the two is the main table and the referencing table?
SELECT keyword, count(movie_keyword.id) as 'Number of Occurences'
FROM movie_keyword
INNER JOIN keyword
ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY 'Number of Occurences' DESC
LIMIT 5
I know this is pretty old question, but because I think that xception forgot about delivery tables in mysql, I want to suggest another solution. It requires only one query and it omits joining big data. If someone has such big data and can test it ( maybe question creator ), please share results.
SELECT keyword.keyword, _temp.occurences
FROM (
SELECT keyword_id, COUNT( keyword_id ) AS occurences
FROM movie_keyword
GROUP BY keyword_id
ORDER BY occurences DESC
LIMIT 5
) AS _temp
JOIN keyword ON _temp.keyword_id = keyword.id
ORDER BY _temp.occurences DESC
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.