[英]Is there a better way of doing this in mysql? - update entire column with another select and group by
I have a table sample
with two columns id
and cnt
and another table PostTags
with two columns postid
and tagid
我有一个表
sample
有两列id
和cnt
和另一个表PostTags
有两列postid
和tagid
I want to update all cnt values with their corresponding counts and I have written the following query: 我想用其对应的计数更新所有的cnt值,并编写了以下查询:
UPDATE sample SET
cnt = (SELECT COUNT(tagid)
FROM PostTags
WHERE sample.postid = PostTags.postid
GROUP BY PostTags.postid)
I intend to update entire column at once and I seem to accomplish this. 我打算立即更新整个专栏,而我似乎做到了。 But performance-wise, is this the best way?
但是从性能角度来看,这是最好的方法吗? Or is there a better way?
或者,还有更好的方法?
EDIT 编辑
I've been running this query (without GROUP BY) for over 1 hour for ~18m records. 我已经运行了这个查询(没有GROUP BY)超过1个小时,记录了约1800万条记录。 I'm looking for a query that is better in performance.
我正在寻找性能更好的查询。
Remove the unnecessary GROUP BY and the statement looks good. 删除不必要的GROUP BY,该语句看起来不错。 If however you expect many sample.set already to contain the correct value, then you would update many records that need no update.
但是,如果您希望许多sample.set已经包含正确的值,那么您将更新许多不需要更新的记录。 This may create some overhead (larger rollback segments, triggers executed etc.) and thus take longer.
这可能会产生一些开销(较大的回滚段,执行的触发器等),因此会花费更长的时间。
In order to only update the records that need be updated, join: 为了只更新需要更新的记录,请加入:
UPDATE sample
INNER JOIN
(
SELECT postid, COUNT(tagid) as cnt
FROM PostTags
GROUP BY postid
) tags ON tags.postid = sample.postid
SET sample.cnt = tags.cnt
WHERE sample.cnt != tags.cnt OR sample.cnt IS NULL;
Here is the SQL fiddle: http://sqlfiddle.com/#!2/d5e88 . 这是SQL提琴: http ://sqlfiddle.com/#!2/ d5e88 。
That query should not take an hour. 该查询不应花费一个小时。 I just did a test, running a query like yours on a table of 87520
keywords
and matching rows in a many-to-many table of 2776445 movie_keyword
rows. 我只是做了一个测试,对87520个
keywords
的表运行像您这样的查询,并在2776445个movie_keyword
行的多对多表中movie_keyword
行。 In my test, it took 32 seconds . 在我的测试中,花了32秒 。
The crucial part that you're probably missing is that you must have an index on the lookup column, which is PostTags.postid
in your example. 您可能缺少的关键部分是您必须在查找列上有一个索引,该
PostTags.postid
在您的示例中为PostTags.postid
。
Here's the EXPLAIN from my test (finally we can do EXPLAIN on UPDATE statements in MySQL 5.6): 这是我的测试中的EXPLAIN(最后,我们可以对MySQL 5.6中的UPDATE语句执行EXPLAIN):
mysql> explain update kc1 set count =
(select count(*) from movie_keyword
where kc1.keyword_id = movie_keyword.keyword_id) \G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: kc1
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 98867
Extra: Using temporary
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: movie_keyword
type: ref
possible_keys: k_m
key: k_m
key_len: 4
ref: imdb.kc1.keyword_id
rows: 17
Extra: Using index
Having an index on keyword_id
is important. 在
keyword_id
上建立索引很重要。 In my case, I had a compound index, but a single-column index would help too. 就我而言,我有一个复合索引,但是单列索引也有帮助。
CREATE TABLE `movie_keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `k_m` (`keyword_id`,`movie_id`)
);
The difference between COUNT(*)
and COUNT(movie_id)
should be immaterial, assuming movie_id
is NOT NULLable. 假设
movie_id
不可为空,则COUNT(*)
和COUNT(movie_id)
之间的区别应该不重要。 But I use COUNT(*)
because it'll still count as an index-only query if my index is defined only on the keyword_id
column. 但是我使用
COUNT(*)
因为如果仅在keyword_id
列上定义了我的索引,它将仍然算作仅索引的查询。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.