Optimising a MySQL query with a SUM in the sub-query

Question

I'm trying to do a very specific thing in WordPress: expire posts over 30 days old that have no "likes" (or negative "likes") based on someone else's plugin. That plugin stores individual likes/dislikes for each user/post in a separate table (+1/-1), which means that my selection criteria are complex, based on a SUM.

Doing the SELECT is easy, as it is a simple JOIN on post ID with a "HAVING" clause to detect the total likes value of more than zero. It looks like this (with all the table names simplified for readability):

SELECT posts.id, SUM( wti_like_post.value )
FROM posts
JOIN wti_like_post
ON posts.ID = wti_like_post.post_id
WHERE posts.post_date < DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY posts.ID
HAVING SUM( wti_like_post.value ) < 1

But I'm stuck on optimising the UPDATE query. The unoptimised version takes 2 minutes to run, which is unacceptable.

UPDATE posts
SET posts.post_status = 'trash'
WHERE posts.post_status = 'publish'
AND posts.post_type = 'post'
AND posts.post_date < DATE_SUB(NOW(), INTERVAL 30 DAY)
AND ID IN
(SELECT post_id FROM wti_like_posts
 GROUP BY post_id
 HAVING SUM( wti_like_post.value ) < 1 )

This is obviously because of my inability to create an UPDATE query with a join based on a SUM result - I simply don't know how to do that (believe me, I've tried!).

If anyone could optimise that UPDATE for me, I'd be terribly grateful. It'd also teach me how to do it properly, which would be neat!

Thanks in advance.

Answer 1

Well it also depends on the no. of posts and also in subquery it will SUM the post ids which were trashed also there should be filter in the subquery rather than your update query try this one

UPDATE posts
SET posts.post_status = 'trash'
WHERE ID IN
(
SELECT posts.id
FROM posts
INNER JOIN wti_like_post
ON (posts.ID = wti_like_post.post_id AND  posts.post_status = 'publish'
AND posts.post_type = 'post')
WHERE posts.post_date < DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY posts.ID
HAVING SUM( wti_like_post.value ) < 1    
 )

Answer 2

Well maybe sounds stupid but you could create a table out of the select, place an Index on it and then simply use the standard JOIN for update on that new table.

I guess even if you do that always on the fly, it should be faster then the non-indexed version.

EDIT: Here is the code, sry it's out of my head haven't checked if it passes but it should give you at least an idea what I mean.

CREATE TABLE joinHelper(
  id INT NOT NULL,
  PRIMARY KEY ( id )
);
INSERT INTO joinHelper(id)
SELECT post_id FROM wti_like_posts
GROUP BY post_id
HAVING SUM( wti_like_post.value ) < 1

UPDATE posts JOIN joinHelper ON (posts.ID = joinHelper.id)
SET posts.post_status = 'trash'
WHERE posts.post_status = 'publish'
AND posts.post_type = 'post'
AND posts.post_date < DATE_SUB(NOW(), INTERVAL 30 DAY)

Optimising a MySQL query with a SUM in the sub-query

Question

2 answers

solution1
0 2013-08-03 14:53:41

solution2
-1 2013-08-03 18:43:34

Optimising a MySQL query with a SUM in the sub-query

Question

2 answers

solution1 0 2013-08-03 14:53:41

solution2 -1 2013-08-03 18:43:34

solution1
0 2013-08-03 14:53:41

solution2
-1 2013-08-03 18:43:34