I have a table sample
with two columns id
and cnt
and another table PostTags
with two columns postid
and tagid
I want to update all cnt values with their corresponding counts and I have written the following query:
UPDATE sample SET
cnt = (SELECT COUNT(tagid)
FROM PostTags
WHERE sample.postid = PostTags.postid
GROUP BY PostTags.postid)
I intend to update entire column at once and I seem to accomplish this. But performance-wise, is this the best way? Or is there a better way?
EDIT
I've been running this query (without GROUP BY) for over 1 hour for ~18m records. I'm looking for a query that is better in performance.
Remove the unnecessary GROUP BY and the statement looks good. If however you expect many sample.set already to contain the correct value, then you would update many records that need no update. This may create some overhead (larger rollback segments, triggers executed etc.) and thus take longer.
In order to only update the records that need be updated, join:
UPDATE sample
INNER JOIN
(
SELECT postid, COUNT(tagid) as cnt
FROM PostTags
GROUP BY postid
) tags ON tags.postid = sample.postid
SET sample.cnt = tags.cnt
WHERE sample.cnt != tags.cnt OR sample.cnt IS NULL;
Here is the SQL fiddle: http://sqlfiddle.com/#!2/d5e88 .
That query should not take an hour. I just did a test, running a query like yours on a table of 87520 keywords
and matching rows in a many-to-many table of 2776445 movie_keyword
rows. In my test, it took 32 seconds .
The crucial part that you're probably missing is that you must have an index on the lookup column, which is PostTags.postid
in your example.
Here's the EXPLAIN from my test (finally we can do EXPLAIN on UPDATE statements in MySQL 5.6):
mysql> explain update kc1 set count =
(select count(*) from movie_keyword
where kc1.keyword_id = movie_keyword.keyword_id) \G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: kc1
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 98867
Extra: Using temporary
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: movie_keyword
type: ref
possible_keys: k_m
key: k_m
key_len: 4
ref: imdb.kc1.keyword_id
rows: 17
Extra: Using index
Having an index on keyword_id
is important. In my case, I had a compound index, but a single-column index would help too.
CREATE TABLE `movie_keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `k_m` (`keyword_id`,`movie_id`)
);
The difference between COUNT(*)
and COUNT(movie_id)
should be immaterial, assuming movie_id
is NOT NULLable. But I use COUNT(*)
because it'll still count as an index-only query if my index is defined only on the keyword_id
column.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.