Hello I am looking for ways to optimize the mysql query, basically I am fetching the articles for the user which belong to category_id = 25 and source_id not in a table where I store source id's from which user has unsubscribed.
select
a.article_id,
a.article_title,
a.source_id,
a.article_publish_date,
a.article_details,
n.source_name
from sources n
INNER JOIN articles a
ON (a.source_id = n.source_id)
WHERE n.category_id = 25
AND n.source_id NOT IN(select
source_id
from news_sources_deselected
WHERE user_id = 5)
ORDER BY a.article_publish_date DESC
Schema for Articles Table
CREATE TABLE IF NOT EXISTS `articles` (<br>
`article_id` int(255) NOT NULL auto_increment,<br>
`article_title` varchar(255) NOT NULL,<br>
`source_id` int(255) NOT NULL,<br>
`article_publish_date` bigint(255) NOT NULL,<br>
`article_details` text NOT NULL,<br>
PRIMARY KEY (`article_id`),<br>
KEY `source_id` (`source_id`),<br>
KEY `article_publish_date` (`article_publish_date`)<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Contains articles.';
Structure for Sources table
CREATE TABLE IF NOT EXISTS `sources` (<br>
`source_id` int(255) NOT NULL auto_increment,<br>
`category_id` int(255) NOT NULL,<br>
`source_name` varchar(255) character set latin1 NOT NULL,<br>
`user_id` int(255) NOT NULL,<br>
PRIMARY KEY (`source_id`),<br>
KEY `category_id` (`category_id`),<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='News Sources.'
The articles table has around 0.3 Million records and sources table contains around 1000 records, the query takes around 180 seconds to execute.
Any help will be greatly appreciated.
Try using a derieved query with IS NULL condition. You explain says there is a dependent subquery. Ignore using it and use derieved query for your problem. This will increase the performance
select
a.article_id,
a.article_title,
a.source_id,
a.article_publish_date,
a.article_details,
n.source_name
from sources n
INNER JOIN articles a
ON (a.source_id = n.source_id)
LEFT JOIN (SELECT *
FROM news_sources_deselected
WHERE user_id = 5) AS nsd
ON nsd.source_id = n.source_id
WHERE n.category_id = 25
AND nsd.source_id IS NULL
ORDER BY a.article_publish_date DESC
Use EXPLAIN in front of your query and analyze results.
Here you can find how to start your optimization work.
I see few issues you could check.
Do you need all those rows at once? Maybe consider splitting this query to multiple shards (paging)?
Try this query
select
a.article_id,
a.article_title,
a.source_id,
a.article_publish_date,
a.article_details,
n.source_name
from
sources n
INNER JOIN
articles a
ON
n.category_id = 25 AND
a.source_id = n.source_id
INNER JOIN
news_sources_deselected nsd
ON
nsd.user_id <> 5 AND n.source_id = nsd.source_id
ORDER BY
a.article_publish_date DESC
I have removed the extra query and added news_sources_deselected
in join by accepting all source_id
for user_id
other than with id 5
.
Or we can go for using only needed records for join as user raheelshan has mentioned
select
a.article_id,
a.article_title,
a.source_id,
a.article_publish_date,
a.article_details,
n.source_name
from
(select
*
from
sources
where
category_id = 25) n
INNER JOIN
articles a
ON
a.source_id = n.source_id
INNER JOIN
(select
*
from
news_sources_deselected
where
user_id <> 5) nsd
ON
n.source_id = nsd.source_id
ORDER BY
a.article_publish_date DESC
Hope this helps..
我通过对表进行分区来解决此问题,但我仍然愿意提出建议。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.