Improve MySQL Query Performance

Question

I have the following query which joins 5 InnoDB related tables to get the desired result set of 10 rows, I did my best to solve the issue by adding indexes and re-writing the query in many different ways but I ended up either unexpected result or a very slow query.

HERE IS THE QUERY

SELECT 
    a.*,
    c.id as category_id,
    c.title as catname,
    CONCAT(u.fname, ' ', u.lname) as username,
    DATE_FORMAT(a.created, '%W %M %d, %Y - %T') as long_date,
    DATE_FORMAT(a.created, '%d/%m/%Y - %T') as short_date,
    (SELECT 
            COUNT(article_id)
        FROM
            comment
        WHERE 
            article_id = a.id) as totalcomments,
    YEAR(a.created) as year,
    MONTH(a.created) as month,
    DAY(a.created) as day
FROM
    article as a
        INNER JOIN
    article_related_categories rc ON a.id = rc.article_id
        LEFT JOIN
    category as c ON c.id = rc.category_id
        LEFT JOIN
    user as u ON u.id = a.user_id
WHERE
    rc.category_id = 1
        AND a.created <= NOW()
        AND (a.expire = '0000-00-00 00:00:00'
        OR a.expire >= NOW())
        AND a.published IS NOT NULL

ORDER BY a.created DESC
LIMIT 0 , 10

Click Here to see the explain screenshot

Currently there are over 13,000 rows in the article table and a rapid growth is expected.

The trouble is, this query can take a significant amount of time to execute and it takes about 3-4 seconds. I suspect that the INNER JION causes most of the issue, but I thought I would ask here if anyone had any ideas for improving the performance of this query.

Answer 1

Nested SELECT can be slowing things down. Join on comment table and GROUP BY a.id :

...
    COUNT(*) as totalcomments,
...    
FROM
    ...
    LEFT JOIN comment AS comm.article_id = a.id
WHERE
    ...
GROUP BY a.id

Answer 2

Well a quick fix is to get rid of this

    AND a.created <= NOW()

because an article created in the future really doesn't make sense. One less thing for the db to do usually (almost always) results in faster execution.

The difficulty in answering is not knowing what you are really wishing to get from the db is. You need to think out your left joins and eliminate them where applicable. Problem your not eliminating rows with a left join and smaller result sets like the ones you get by eliminating rows return faster simply because the result set is smaller.

For optimum speed I would start from the related categories table because I have the results narrowed down in the where statement to 1 already and I'm only looking at one distinct value for the related_category.

select blah from related_categories rc 
join comment c on r.id = c.id 
join blah b on b.id = c.id
where rc.id = 1

Answer 3

I would have an indexes on your tables

article table index -- ( published, expire, id )
article table index -- ( id ) just the primary key ID for secondary join criteria
article_related_categories table index( article_id, category_id )
comment table (article_id)

Then, have a pre-query do nothing but get the ID and article and counts for the related category of interest, order and limit to the 10 articles... THEN join to the category and users table for your final output.

SELECT
      a2.*,
      c.id as category_id,
      c.title as catname,
      CONCAT(u.fname, ' ', u.lname) as username,
      DATE_FORMAT(a2.created, '%W %M %d, %Y - %T') as long_date,
      DATE_FORMAT(a2.created, '%d/%m/%Y - %T') as short_date,
      PreQual.TotalComments,
      YEAR(a2.created) as year,
      MONTH(a2.created) as month,
      DAY(a2.created) as day
   from 
      ( select 
              a.id,
              rc.category_id,
              COUNT(c.article_id) as TotalComments
           from 
              article a
                 join article_related_categories rc 
                    ON a.id = rc.article_id
                    AND rc.category_id = 1
                 left join comment c
                    ON a.id = c.article_id
           where
                  a.published IS NOT NULL
              AND (    a.expire >= now()
                    OR a.expire = '0000-00-00 00:00:00' )
           group by
              a.id,
              rc.category_id
           order by
              a.created DESC
           limit
              0, 10 ) PreQual
        JOIN article a2
           ON PreQual.ID = a2.id
           LEFT JOIN user u
              ON a2.user_id = u.id
        LEFT JOIN category as c 
           ON PreQual.Category_ID = c.id

Now, even with the above query, doing web-based activity (which it appears), and doing counts from an entire subset on a correlated condition can be a HUGE performance hit. You would be better to DE-Normalize the data with one respect. In your article table, add a column for CommentCount. Then, when any new comment is added, have a trigger after insert to the comment that basically does a

update Articles
   set CommentCount = CommentCount +1
   where id = the article ID of the new comment ID just inserted.

Then, you never have to go back and do a COUNT() every time. That would be your best operational move. You will have to default all counts before the trigger is created, but that would be a one-time correlated update of counts. You would just need to go back to the related article categories table to fit your category criteria of interest.

Improve MySQL Query Performance

Question

3 answers

solution1
0 2014-01-22 23:31:11

solution2
0 2014-01-23 00:32:52

solution3
0 2014-01-23 02:02:58

Improve MySQL Query Performance

Question

3 answers

solution1 0 2014-01-22 23:31:11

solution2 0 2014-01-23 00:32:52

solution3 0 2014-01-23 02:02:58

solution1
0 2014-01-22 23:31:11

solution2
0 2014-01-23 00:32:52

solution3
0 2014-01-23 02:02:58