I have the following database schema
keywords(id, keyword, lang) :( about 8M records)
topics(id, topic, lang) : ( about 2.6M records)
topic_keywords(topic_id, keyword_id, weight) : (200M records)
In a script, I have about 50-100 keywords with an additional field keyword_score
and I want to retrieve the top 20 topics that corresponds to those keywords based on the following formula : SUM(keyword_score * topic_weight)
A solution I implemented currently in my script is :
temporary_keywords(keyword_id, keyword_score )
keyword_score
Then execute the following query to retrieve topics
SELECT topic_id, SUM(weight * keyword_score) AS score FROM temporary_keywords JOIN topic_keywords USING keyword_id GROUP BY topic_id ORDER BY score DESC LIMIT 20
This solution works, but it takes in some cases up to 3 seconds to execute, which is too much for me.
I'm asking if there is a way to optimize this query? or should I redesign the data structure into a NoSQL database?
Any other solutions or ideas beyond what is listed above are most appreciated
UPDATE (SHOW CREATE TABLE)
CREATE TABLE `topic_keywords` (
`topic_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
`weight` float DEFAULT '0',
PRIMARY KEY (`topic_id`,`keyword_id`),
KEY `keyword_id_idx` (`keyword_id`,`topic_id`,`weight`)
)
CREATE TEMPORARY TABLE temporary_keywords
( keyword_id INT PRIMARY KEY NOT NULL,
keyword_score DOUBLE
)
EXPLAIN QUERY
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
| 1 | SIMPLE | temporary_keywords | ALL | PRIMARY | NULL | NULL | NULL | 100 | Using temporary; Using filesort |
| 1 | SIMPLE | topic_keywords | ref | keyword_id_idx | keyword_id_idx | 4 | topics.temporary_keywords.keyword_id | 10778853 | Using index |
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
Incorrect, but uncaught, syntax.
JOIN topic_keywords USING keyword_id
-->
JOIN topic_keywords USING(keyword_id)
If that does not fix it, please provide EXPLAIN FORMAT=JSON SELECT ...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.