简体   繁体   中英

MySQL JOIN time reduction

This query is taking over a minute to complete:

SELECT keyword, count(*) as 'Number of Occurences'
    FROM movie_keyword
    JOIN
    keyword
    ON keyword.`id` = movie_keyword.`keyword_id`
    GROUP BY keyword
    ORDER BY count(*) DESC
    LIMIT 5

Every keyword has an ID associated with it (keyword_id column). And that ID is used to look up the actual keyword from the keyword table.

movie_keyword has 2.8 million rows

keyword has 127,000

However to return just the most used keyword_id's takes only 1 second:

SELECT keyword_id, count(*)
    FROM movie_keyword
    GROUP BY keyword_id
    ORDER BY count(*) DESC
    LIMIT 5

Is there a more efficient way of doing this?

Output with EXPLAIN:

1   SIMPLE  keyword ALL PRIMARY NULL    NULL    NULL    125405  Using temporary; Using filesort
1   SIMPLE  movie_keyword   ref idx_keywordid   idx_keywordid   4   imdb.keyword.id 28  Using index

Structure:

CREATE TABLE `movie_keyword` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `movie_id` int(11) NOT NULL,
  `keyword_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_mid` (`movie_id`),
  KEY `idx_keywordid` (`keyword_id`),
  KEY `keyword_ix` (`keyword_id`),
  CONSTRAINT `movie_keyword_keyword_id_exists` FOREIGN KEY (`keyword_id`) REFERENCES `keyword` (`id`),
  CONSTRAINT `movie_keyword_movie_id_exists` FOREIGN KEY (`movie_id`) REFERENCES `title` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4256379 DEFAULT CHARSET=latin1;

CREATE TABLE `keyword` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `keyword` text NOT NULL,
  `phonetic_code` varchar(5) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_keyword` (`keyword`(5)),
  KEY `idx_pcode` (`phonetic_code`),
  KEY `keyword_ix` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=127044 DEFAULT CHARSET=latin1;

Untested but should work and be significantly faster in my opinion, not very sure if you're allowed to use limit in a subquery in mysql though, but there are other ways around that.

SELECT keyword, count(*) as 'Number of Occurences'
    FROM movie_keyword
    JOIN
    keyword
    ON keyword.`id` = movie_keyword.`keyword_id`
    WHERE movie_keyword.keyword_id IN (
        SELECT keyword_id
        FROM movie_keyword
        GROUP BY keyword
        ORDER BY count(*) DESC    
        LIMIT 5
    )
    GROUP BY keyword
    ORDER BY count(*) DESC;

This should be faster because you don't join all the 2.8 million entries in movie_keyword with keyword, just the ones that actually match, which I'm guessing are significantly less.

EDIT since mysql doesn't support limit inside a subquery you have to run

SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC    
LIMIT 5;

first and after fetching the results run the second query

SELECT keyword, count(*) as 'Number of Occurences'
    FROM movie_keyword
    JOIN
    keyword
    ON keyword.`id` = movie_keyword.`keyword_id`
    WHERE movie_keyword.keyword_id IN (RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS)
    GROUP BY keyword
    ORDER BY count(*) DESC;

replace RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS with the proper values programatically from whatever language you're using

The query seems fine but I think the structure is not, try to give index on columns

keyword.id

try,

CREATE INDEX keyword_ix ON keyword (id);

or

ALTER TABLE keyword ADD INDEX keyword_ix (id);

much better if you can post the structures of your tables: keyword and Movie_keyword . Which of the two is the main table and the referencing table?

SELECT keyword, count(movie_keyword.id) as 'Number of Occurences'
FROM movie_keyword
     INNER JOIN  keyword
           ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY 'Number of Occurences' DESC
LIMIT 5

I know this is pretty old question, but because I think that xception forgot about delivery tables in mysql, I want to suggest another solution. It requires only one query and it omits joining big data. If someone has such big data and can test it ( maybe question creator ), please share results.

SELECT keyword.keyword, _temp.occurences
FROM (
  SELECT keyword_id, COUNT( keyword_id ) AS occurences
  FROM movie_keyword
  GROUP BY keyword_id
  ORDER BY occurences DESC 
  LIMIT 5
) AS _temp
JOIN keyword ON _temp.keyword_id = keyword.id
ORDER BY _temp.occurences DESC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM