简体   繁体   English

MySQL查询以获取具有1000万行的表中每个条目的最新记录

[英]MySQL query for getting latest record for each entry from table with 10 million rows

Use Case: 用例:

I have table, lets say: " manufacturer " 我有桌子,可以说:“ 制造商

manuf_code  manuf_display_name  record_status  record_timestamp  

----------  -------------------  ------------  ----------------
M000001      Sam                      N        2017-09-13 12:13:16      
M000002      JII                      N        2017-09-13 15:13:15      
M000002      JII                      U        2017-09-13 17:16:35      
M000003      Sun                      N        2017-09-13 18:54:16      
M000004      NG-Graphics              N        2017-09-13 19:13:15
M000004      NG-Graphics              U        2017-09-14 20:16:50 
M000004      NG-Graphics              U        2017-09-14 09:13:25 
M000005      HewNett                  N        2017-09-15 10:24:19     
M000006      HewNett                  N        2017-09-15 10:24:19  
M000007      HewNett                  N        2017-09-15 10:24:19  
M000007      HewNett                  U        2017-09-15 15:10:16 
M000007      HewNett                  U        2017-09-17 21:35:19 
M000007      HewNett                  U        2017-09-17 21:37:26  
  • When user creates a new manufacturer, the details sits in the table with record_status as ' N '. 当用户创建新的制造商时,详细信息位于表中, record_status为“ N ”。
  • When user updates the existing manufacturer, the row for that Manufacturer ID gets updated with record_status as ' U ' 用户更新现有制造商时,该制造商ID的行将用record_status更新为“ U

Now there can be around 7-10 Million such entries with each manufacturer having: 现在,每个制造商可以拥有大约7-10百万个这样的条目:

  • A single entry with status as ' N ' 状态为“ N ”的单个条目
  • Multiple entries with status as ' U ' 状态为“ U ”的多个条目

Requirement: I need to fetch the latest entry for each manufacturer. 要求:我需要获取每个制造商的最新条目。

My query: 我的查询:

SELECT m.manuf_code
     , m.manuf_display_name
     , m.record_timestamp
     , m.record_status 
  FROM manufacturers m 
  JOIN
     ( SELECT manuf_code
           , MAX(record_timestamp) AS maxdate 
        FROM manufacturers 
           WHERE record_status = 'N' OR record_status = 'U' 
         GROUP 
          BY manuf_code) mn
    ON m.manuf_code = mn.manuf_code 
   AND m.record_timestamp = mn.maxdate  

I preferred Join sub query as former was faster, in fetching around 7 Million Data. 我更喜欢Join子查询,因为前者查询速度更快,可获取700万条数据。

But, I need to get this work more fast, as after I fetch this many data I may even have to INSERT the same data in some table with a new record_status. 但是,我需要更快地完成这项工作,因为在获取了这么多数据之后,我什至不得不用新的record_status插入同一表中的相同数据。

Please suggest. 请提出建议。

EDIT: 编辑:

CREATE TABLE `manufacturers` (
  `manuf_code` varchar(20) NOT NULL,
  `record_status` varchar(1) NOT NULL,
  `manuf_display_name` varchar(50) NOT NULL,
  `record_timestamp` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`manuf_code`, `record_update_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

EXPLANATION: 说明:

New entry will have status --> 'N' Update of an existing entry will have status --> 'U' That's it. 新条目将具有状态->'N'更新现有条目将具有状态->'U'就是这样。 Query should get latest for this much. 查询应该是最新的。

Another case, specific to requirement is, we fetch all the latest entries per record and the make the status as 'L' and INSERT them again 针对需求的另一种情况是,我们获取每条记录的所有最新条目,并将状态设置为“ L”,然后再次插入

The immediate question is addressed first, then an alternative design is discussed: 首先解决眼前的问题,然后讨论替代设计:

Groupwise Max 分组最大

This is a "groupwise max" problem. 这是一个“最大分组”问题。 For multi-million row tables, the typical queries are rather slow, all involving full table scans. 对于数百万行的表,典型的查询速度很慢,所有查询都涉及全表扫描。 To improve on that, see http://mysql.rjweb.org/doc.php/groupwise_max 要对此进行改进,请参见http://mysql.rjweb.org/doc.php/groupwise_max

History vs Current 历史与当前

Another approach is to keep 2 tables: 另一种方法是保留2个表:

  • History of actions; 行动History this is what you currently have. 这是您目前拥有的。 It is mostly INSERTed into. 它主要是INSERTed到。
  • Current status for each item. 每个项目的Current状态。 This would be trivial to fetch from. 从中获取将是微不足道的。 It is mostly UPDATEd . 它主要是UPDATEd Or, better yet, INSERT...ON DUPLICATE KEY UPDATE... so that new items can be inserted without extra statements. 或者更好的是, INSERT...ON DUPLICATE KEY UPDATE...以便可以插入新项而无需额外的语句。

You say "When user creates / updates ...". 您说“用户创建/更新时...”。 How is this being performed? 这是如何进行的? I hope they are not issuing SQL statements. 我希望他们不要发布SQL语句。 I suggest you consider some subroutine (in client code) or Stored Procedure (in MySQL). 我建议您考虑一些子例程(在客户端代码中)或存储过程(在MySQL中)。 That way, you can hide the details of the two tables, etc, from the user. 这样,您可以向用户隐藏两个表等的详细信息。

Bulk Upload 批量上传

You say lots of inserts/updates/etc are provide en masse? 你说大量的插入/更新/等都提供了吗? Load such into a temporary table (either CREATE TEMPORARY or a permanent table that you TRUNCATE and reuse ). Then write a relatively small number of SQL statements to combine the data to put into 将此类加载到临时表( CREATE TEMPORARY或您进行TRUNCATE和重用的永久表). Then write a relatively small number of SQL statements to combine the data to put into ). Then write a relatively small number of SQL statements to combine the data to put into Current and shovel (mostly intact) into History`. ). Then write a relatively small number of SQL statements to combine the data to put into Current中and shovel (mostly intact) into History中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM