简体   繁体   English

MySQL从每个组中选择最大记录并插入另一个表

[英]MySQL select max record from each group and insert into another table

There are 4 columns in table A, id, name, create_time and content. 表A中有4列,分别是ID,名称,create_time和content。

create table A
(
    id int primary key,
    name varchar(20),
    create_time datetime,
    content varchar(4000)
);
create table B like A;

I want to select max create_time records in the same name , and insert into another table B . 我想选择具有相同name max create_time记录,然后插入另一个表B

Execute sql as follow, but the time consumption is unacceptable. 如下执行sql,但是时间消耗是不可接受的。

insert into B
select A.*
from A,
    (select name, max(create_time) create_time from B group by name) tmp
where A.name = tmp.name
  and A.create_time = tmp.create_time;

A table has 1000W rows and 10GB, execute sql spend 200s. 一个表有1000W行和10GB,执行sql花费200s。

Is there any way to do this job faster, or change which parameters in MySQL Server to run faster. 有什么方法可以更快地完成此工作,或更改MySQL Server中的哪些参数以更快地运行。

p: table A can be any type, paration table or some else. p:表A可以是任何类型,校验表或其他类型。

First be sure you have proper index on A (name, create_time) and B (name, create_time) then try using explicit join and on condtion 首先,请确保您在A(名称,create_time)和B(名称,create_time)上具有正确的索引,然后尝试使用显式联接和条件

insert into B 
select A.* 
from A 
inner join ( 
    select name, max(create_time) create_time 
    from B 
    group by name) tmp on  ( A.name = tmp.name and A.create_time = tmp.create_time)

The query you need is: 您需要的查询是:

INSERT INTO B
SELECT m.*
FROM A m                                      # m from "max"
LEFT JOIN A l                                 # l from "later"
    ON m.name = l.name                        # the same name
        AND m.create_time < l.create_time     # "l" was created later than "m"
WHERE l.name IS NULL                          # there is no "later"

How it works: 这个怎么运作:

It joins A aliased as m (from "max" ) against itself aliased as l (from "later" than "max" ). 它加入A别名为m反对它(从“最大”),别名为l (从“后”“MAX”)。 The LEFT JOIN ensures that, in the absence of a WHERE clause, all the rows from m are present in the result set. LEFT JOIN确保在没有WHERE子句的情况下,结果集中包含m所有行。 Each row from m is combined with all rows from l that have the same name ( m.name = l.name ) and are created after the row from m ( m.create_time < l.create_time ). m每一行与l中具有相同namem.name = l.name )的所有行合并,并在m的行之后创建( m.create_time < l.create_time )。 The WHERE condition keeps into the results set only the rows from m that do not have any match in l (there is no record with the same name and greater creation time). WHERE条件保留在结果集中,仅m中的行与l没有任何匹配(没有记录具有相同的名称和更长的创建时间)。

Discussion 讨论区

If there are more than one rows in A that have the same name and creation_time , the query returns all of them. 如果A中有多个具有相同namecreation_time ,则查询将返回所有这些行。 In order to keep only one of them and additional condition is required. 为了仅保留其中之一,还需要附加条件。

Add: 加:

OR (m.create_time = l.create_time AND m.id < l.id)

to the ON clause (right before WHERE ). ON子句(在WHERE之前)。 Adjust/replace the m.id < l.id part of the condition to suit your needs (this version favors the rows inserted earlier in the table). 调整/替换条件中的m.id < l.id以满足您的需求(此版本支持在表的前面插入的行)。

Make sure the table A has indexes on the columns used by the query ( name and create_time ). 确保表A在查询使用的列上有索引( namecreate_time )。 Otherwise the performance improvement compared with your original query is not significant. 否则,与原始查询相比,性能提升不明显。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM