[英]MySQL select max record from each group and insert into another table
There are 4 columns in table A, id, name, create_time and content. 表A中有4列,分别是ID,名称,create_time和content。
create table A
(
id int primary key,
name varchar(20),
create_time datetime,
content varchar(4000)
);
create table B like A;
I want to select max create_time
records in the same name
, and insert into another table B
. 我想选择具有相同
name
max create_time
记录,然后插入另一个表B
。
Execute sql as follow, but the time consumption is unacceptable. 如下执行sql,但是时间消耗是不可接受的。
insert into B
select A.*
from A,
(select name, max(create_time) create_time from B group by name) tmp
where A.name = tmp.name
and A.create_time = tmp.create_time;
A table has 1000W rows and 10GB, execute sql spend 200s. 一个表有1000W行和10GB,执行sql花费200s。
Is there any way to do this job faster, or change which parameters in MySQL Server to run faster. 有什么方法可以更快地完成此工作,或更改MySQL Server中的哪些参数以更快地运行。
p: table A can be any type, paration table or some else. p:表A可以是任何类型,校验表或其他类型。
First be sure you have proper index on A (name, create_time) and B (name, create_time) then try using explicit join and on condtion 首先,请确保您在A(名称,create_time)和B(名称,create_time)上具有正确的索引,然后尝试使用显式联接和条件
insert into B
select A.*
from A
inner join (
select name, max(create_time) create_time
from B
group by name) tmp on ( A.name = tmp.name and A.create_time = tmp.create_time)
The query you need is: 您需要的查询是:
INSERT INTO B
SELECT m.*
FROM A m # m from "max"
LEFT JOIN A l # l from "later"
ON m.name = l.name # the same name
AND m.create_time < l.create_time # "l" was created later than "m"
WHERE l.name IS NULL # there is no "later"
It joins A
aliased as m
(from "max" ) against itself aliased as l
(from "later" than "max" ). 它加入
A
别名为m
反对它(从“最大”),别名为l
(从“后”比“MAX”)。 The LEFT JOIN
ensures that, in the absence of a WHERE
clause, all the rows from m
are present in the result set. LEFT JOIN
确保在没有WHERE
子句的情况下,结果集中包含m
所有行。 Each row from m
is combined with all rows from l
that have the same name
( m.name = l.name
) and are created after the row from m
( m.create_time < l.create_time
). m
每一行与l
中具有相同name
( m.name = l.name
)的所有行合并,并在m
的行之后创建( m.create_time < l.create_time
)。 The WHERE
condition keeps into the results set only the rows from m
that do not have any match in l
(there is no record with the same name and greater creation time). WHERE
条件保留在结果集中,仅m
中的行与l
没有任何匹配(没有记录具有相同的名称和更长的创建时间)。
If there are more than one rows in A
that have the same name
and creation_time
, the query returns all of them. 如果
A
中有多个具有相同name
和creation_time
,则查询将返回所有这些行。 In order to keep only one of them and additional condition is required. 为了仅保留其中之一,还需要附加条件。
Add: 加:
OR (m.create_time = l.create_time AND m.id < l.id)
to the ON
clause (right before WHERE
). 到
ON
子句(在WHERE
之前)。 Adjust/replace the m.id < l.id
part of the condition to suit your needs (this version favors the rows inserted earlier in the table). 调整/替换条件中的
m.id < l.id
以满足您的需求(此版本支持在表的前面插入的行)。
Make sure the table A
has indexes on the columns used by the query ( name
and create_time
). 确保表
A
在查询使用的列上有索引( name
和create_time
)。 Otherwise the performance improvement compared with your original query is not significant. 否则,与原始查询相比,性能提升不明显。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.