Use Case:
I have table, lets say: " manufacturer "
manuf_code manuf_display_name record_status record_timestamp
---------- ------------------- ------------ ----------------
M000001 Sam N 2017-09-13 12:13:16
M000002 JII N 2017-09-13 15:13:15
M000002 JII U 2017-09-13 17:16:35
M000003 Sun N 2017-09-13 18:54:16
M000004 NG-Graphics N 2017-09-13 19:13:15
M000004 NG-Graphics U 2017-09-14 20:16:50
M000004 NG-Graphics U 2017-09-14 09:13:25
M000005 HewNett N 2017-09-15 10:24:19
M000006 HewNett N 2017-09-15 10:24:19
M000007 HewNett N 2017-09-15 10:24:19
M000007 HewNett U 2017-09-15 15:10:16
M000007 HewNett U 2017-09-17 21:35:19
M000007 HewNett U 2017-09-17 21:37:26
Now there can be around 7-10 Million such entries with each manufacturer having:
Requirement: I need to fetch the latest entry for each manufacturer.
My query:
SELECT m.manuf_code
, m.manuf_display_name
, m.record_timestamp
, m.record_status
FROM manufacturers m
JOIN
( SELECT manuf_code
, MAX(record_timestamp) AS maxdate
FROM manufacturers
WHERE record_status = 'N' OR record_status = 'U'
GROUP
BY manuf_code) mn
ON m.manuf_code = mn.manuf_code
AND m.record_timestamp = mn.maxdate
I preferred Join sub query as former was faster, in fetching around 7 Million Data.
But, I need to get this work more fast, as after I fetch this many data I may even have to INSERT the same data in some table with a new record_status.
Please suggest.
EDIT:
CREATE TABLE `manufacturers` (
`manuf_code` varchar(20) NOT NULL,
`record_status` varchar(1) NOT NULL,
`manuf_display_name` varchar(50) NOT NULL,
`record_timestamp` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`manuf_code`, `record_update_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
EXPLANATION:
New entry will have status --> 'N' Update of an existing entry will have status --> 'U' That's it. Query should get latest for this much.
Another case, specific to requirement is, we fetch all the latest entries per record and the make the status as 'L' and INSERT them again
The immediate question is addressed first, then an alternative design is discussed:
Groupwise Max
This is a "groupwise max" problem. For multi-million row tables, the typical queries are rather slow, all involving full table scans. To improve on that, see http://mysql.rjweb.org/doc.php/groupwise_max
History vs Current
Another approach is to keep 2 tables:
History
of actions; this is what you currently have. It is mostly INSERTed
into. Current
status for each item. This would be trivial to fetch from. It is mostly UPDATEd
. Or, better yet, INSERT...ON DUPLICATE KEY UPDATE...
so that new items can be inserted without extra statements. You say "When user creates / updates ...". How is this being performed? I hope they are not issuing SQL statements. I suggest you consider some subroutine (in client code) or Stored Procedure (in MySQL). That way, you can hide the details of the two tables, etc, from the user.
Bulk Upload
You say lots of inserts/updates/etc are provide en masse? Load such into a temporary table (either CREATE TEMPORARY
or a permanent table that you TRUNCATE
and reuse ). Then write a relatively small number of SQL statements to combine the data to put into
). Then write a relatively small number of SQL statements to combine the data to put into
Current and shovel (mostly intact) into
History`.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.