[英]MySQL query for getting latest record for each entry from table with 10 million rows
Use Case: 用例:
I have table, lets say: " manufacturer " 我有桌子,可以说:“ 制造商 ”
manuf_code manuf_display_name record_status record_timestamp
---------- ------------------- ------------ ----------------
M000001 Sam N 2017-09-13 12:13:16
M000002 JII N 2017-09-13 15:13:15
M000002 JII U 2017-09-13 17:16:35
M000003 Sun N 2017-09-13 18:54:16
M000004 NG-Graphics N 2017-09-13 19:13:15
M000004 NG-Graphics U 2017-09-14 20:16:50
M000004 NG-Graphics U 2017-09-14 09:13:25
M000005 HewNett N 2017-09-15 10:24:19
M000006 HewNett N 2017-09-15 10:24:19
M000007 HewNett N 2017-09-15 10:24:19
M000007 HewNett U 2017-09-15 15:10:16
M000007 HewNett U 2017-09-17 21:35:19
M000007 HewNett U 2017-09-17 21:37:26
Now there can be around 7-10 Million such entries with each manufacturer having: 现在,每个制造商可以拥有大约7-10百万个这样的条目:
Requirement: I need to fetch the latest entry for each manufacturer. 要求:我需要获取每个制造商的最新条目。
My query: 我的查询:
SELECT m.manuf_code
, m.manuf_display_name
, m.record_timestamp
, m.record_status
FROM manufacturers m
JOIN
( SELECT manuf_code
, MAX(record_timestamp) AS maxdate
FROM manufacturers
WHERE record_status = 'N' OR record_status = 'U'
GROUP
BY manuf_code) mn
ON m.manuf_code = mn.manuf_code
AND m.record_timestamp = mn.maxdate
I preferred Join sub query as former was faster, in fetching around 7 Million Data. 我更喜欢Join子查询,因为前者查询速度更快,可获取700万条数据。
But, I need to get this work more fast, as after I fetch this many data I may even have to INSERT the same data in some table with a new record_status. 但是,我需要更快地完成这项工作,因为在获取了这么多数据之后,我什至不得不用新的record_status插入同一表中的相同数据。
Please suggest. 请提出建议。
EDIT: 编辑:
CREATE TABLE `manufacturers` (
`manuf_code` varchar(20) NOT NULL,
`record_status` varchar(1) NOT NULL,
`manuf_display_name` varchar(50) NOT NULL,
`record_timestamp` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`manuf_code`, `record_update_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
EXPLANATION: 说明:
New entry will have status --> 'N' Update of an existing entry will have status --> 'U' That's it. 新条目将具有状态->'N'更新现有条目将具有状态->'U'就是这样。 Query should get latest for this much. 查询应该是最新的。
Another case, specific to requirement is, we fetch all the latest entries per record and the make the status as 'L' and INSERT them again 针对需求的另一种情况是,我们获取每条记录的所有最新条目,并将状态设置为“ L”,然后再次插入
The immediate question is addressed first, then an alternative design is discussed: 首先解决眼前的问题,然后讨论替代设计:
Groupwise Max 分组最大
This is a "groupwise max" problem. 这是一个“最大分组”问题。 For multi-million row tables, the typical queries are rather slow, all involving full table scans. 对于数百万行的表,典型的查询速度很慢,所有查询都涉及全表扫描。 To improve on that, see http://mysql.rjweb.org/doc.php/groupwise_max 要对此进行改进,请参见http://mysql.rjweb.org/doc.php/groupwise_max
History vs Current 历史与当前
Another approach is to keep 2 tables: 另一种方法是保留2个表:
History
of actions; 行动History
; this is what you currently have. 这是您目前拥有的。 It is mostly INSERTed
into. 它主要是INSERTed
到。 Current
status for each item. 每个项目的Current
状态。 This would be trivial to fetch from. 从中获取将是微不足道的。 It is mostly UPDATEd
. 它主要是UPDATEd
。 Or, better yet, INSERT...ON DUPLICATE KEY UPDATE...
so that new items can be inserted without extra statements. 或者更好的是, INSERT...ON DUPLICATE KEY UPDATE...
以便可以插入新项而无需额外的语句。 You say "When user creates / updates ...". 您说“用户创建/更新时...”。 How is this being performed? 这是如何进行的? I hope they are not issuing SQL statements. 我希望他们不要发布SQL语句。 I suggest you consider some subroutine (in client code) or Stored Procedure (in MySQL). 我建议您考虑一些子例程(在客户端代码中)或存储过程(在MySQL中)。 That way, you can hide the details of the two tables, etc, from the user. 这样,您可以向用户隐藏两个表等的详细信息。
Bulk Upload 批量上传
You say lots of inserts/updates/etc are provide en masse? 你说大量的插入/更新/等都提供了吗? Load such into a temporary table (either CREATE TEMPORARY
or a permanent table that you TRUNCATE
and reuse ). Then write a relatively small number of SQL statements to combine the data to put into
将此类加载到临时表( CREATE TEMPORARY
或您进行TRUNCATE
和重用的永久表). Then write a relatively small number of SQL statements to combine the data to put into
). Then write a relatively small number of SQL statements to combine the data to put into
Current and shovel (mostly intact) into
History`. ). Then write a relatively small number of SQL statements to combine the data to put into
Current中and shovel (mostly intact) into
History中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.