简体   繁体   中英

MySQL query for getting latest record for each entry from table with 10 million rows

Use Case:

I have table, lets say: " manufacturer "

manuf_code  manuf_display_name  record_status  record_timestamp  

----------  -------------------  ------------  ----------------
M000001      Sam                      N        2017-09-13 12:13:16      
M000002      JII                      N        2017-09-13 15:13:15      
M000002      JII                      U        2017-09-13 17:16:35      
M000003      Sun                      N        2017-09-13 18:54:16      
M000004      NG-Graphics              N        2017-09-13 19:13:15
M000004      NG-Graphics              U        2017-09-14 20:16:50 
M000004      NG-Graphics              U        2017-09-14 09:13:25 
M000005      HewNett                  N        2017-09-15 10:24:19     
M000006      HewNett                  N        2017-09-15 10:24:19  
M000007      HewNett                  N        2017-09-15 10:24:19  
M000007      HewNett                  U        2017-09-15 15:10:16 
M000007      HewNett                  U        2017-09-17 21:35:19 
M000007      HewNett                  U        2017-09-17 21:37:26  
  • When user creates a new manufacturer, the details sits in the table with record_status as ' N '.
  • When user updates the existing manufacturer, the row for that Manufacturer ID gets updated with record_status as ' U '

Now there can be around 7-10 Million such entries with each manufacturer having:

  • A single entry with status as ' N '
  • Multiple entries with status as ' U '

Requirement: I need to fetch the latest entry for each manufacturer.

My query:

SELECT m.manuf_code
     , m.manuf_display_name
     , m.record_timestamp
     , m.record_status 
  FROM manufacturers m 
  JOIN
     ( SELECT manuf_code
           , MAX(record_timestamp) AS maxdate 
        FROM manufacturers 
           WHERE record_status = 'N' OR record_status = 'U' 
         GROUP 
          BY manuf_code) mn
    ON m.manuf_code = mn.manuf_code 
   AND m.record_timestamp = mn.maxdate  

I preferred Join sub query as former was faster, in fetching around 7 Million Data.

But, I need to get this work more fast, as after I fetch this many data I may even have to INSERT the same data in some table with a new record_status.

Please suggest.

EDIT:

CREATE TABLE `manufacturers` (
  `manuf_code` varchar(20) NOT NULL,
  `record_status` varchar(1) NOT NULL,
  `manuf_display_name` varchar(50) NOT NULL,
  `record_timestamp` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`manuf_code`, `record_update_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

EXPLANATION:

New entry will have status --> 'N' Update of an existing entry will have status --> 'U' That's it. Query should get latest for this much.

Another case, specific to requirement is, we fetch all the latest entries per record and the make the status as 'L' and INSERT them again

The immediate question is addressed first, then an alternative design is discussed:

Groupwise Max

This is a "groupwise max" problem. For multi-million row tables, the typical queries are rather slow, all involving full table scans. To improve on that, see http://mysql.rjweb.org/doc.php/groupwise_max

History vs Current

Another approach is to keep 2 tables:

  • History of actions; this is what you currently have. It is mostly INSERTed into.
  • Current status for each item. This would be trivial to fetch from. It is mostly UPDATEd . Or, better yet, INSERT...ON DUPLICATE KEY UPDATE... so that new items can be inserted without extra statements.

You say "When user creates / updates ...". How is this being performed? I hope they are not issuing SQL statements. I suggest you consider some subroutine (in client code) or Stored Procedure (in MySQL). That way, you can hide the details of the two tables, etc, from the user.

Bulk Upload

You say lots of inserts/updates/etc are provide en masse? Load such into a temporary table (either CREATE TEMPORARY or a permanent table that you TRUNCATE and reuse ). Then write a relatively small number of SQL statements to combine the data to put into ). Then write a relatively small number of SQL statements to combine the data to put into Current and shovel (mostly intact) into History`.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM