简体   繁体   中英

Optimize join query with SUM, Group By and ORDER By Clauses

I have the following database schema

keywords(id, keyword, lang) :( about 8M records)
topics(id, topic, lang) : ( about 2.6M records)
topic_keywords(topic_id, keyword_id, weight) : (200M records)

In a script, I have about 50-100 keywords with an additional field keyword_score and I want to retrieve the top 20 topics that corresponds to those keywords based on the following formula : SUM(keyword_score * topic_weight)

A solution I implemented currently in my script is :

  • I create a temporary table as follow temporary_keywords(keyword_id, keyword_score )
  • Insert all 50-100 keywords to it with their keyword_score
  • Then execute the following query to retrieve topics

     SELECT topic_id, SUM(weight * keyword_score) AS score FROM temporary_keywords JOIN topic_keywords USING keyword_id GROUP BY topic_id ORDER BY score DESC LIMIT 20 

This solution works, but it takes in some cases up to 3 seconds to execute, which is too much for me.

I'm asking if there is a way to optimize this query? or should I redesign the data structure into a NoSQL database?

Any other solutions or ideas beyond what is listed above are most appreciated

UPDATE (SHOW CREATE TABLE)

CREATE TABLE `topic_keywords` (
  `topic_id` int(11) NOT NULL,
  `keyword_id` int(11) NOT NULL,
  `weight` float DEFAULT '0',
  PRIMARY KEY (`topic_id`,`keyword_id`),
  KEY `keyword_id_idx` (`keyword_id`,`topic_id`,`weight`)
)

CREATE TEMPORARY TABLE temporary_keywords 
(   keyword_id INT PRIMARY KEY NOT NULL,
    keyword_score  DOUBLE 
)

EXPLAIN QUERY

+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
| id | select_type | table              | type | possible_keys        | key                  | key_len | ref                                  | rows     | Extra                           |
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
|  1 | SIMPLE      | temporary_keywords | ALL  | PRIMARY              | NULL                 | NULL    | NULL                                 |      100 | Using temporary; Using filesort |
|  1 | SIMPLE      | topic_keywords     | ref  | keyword_id_idx       | keyword_id_idx       | 4       | topics.temporary_keywords.keyword_id | 10778853 | Using index                     |
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+

Incorrect, but uncaught, syntax.

JOIN topic_keywords USING keyword_id

-->

JOIN topic_keywords USING(keyword_id)

If that does not fix it, please provide EXPLAIN FORMAT=JSON SELECT ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM