Approach to partitioning a large MySQL InnoDB table

Question

The I have a table which will receive 45-60 million rows of IOT type data a year. The initial desire is to never delete data as we might use it for different types of "big data analysis". Today this table needs to support our online application. The app needs fast query times for data that is usually within the last 30 or 90 days. So I was thinking that partitioning might be a good idea.

Our current thinking is to use an 'aging' column, called partition_id in this case. Records within the last 30 days are partition_id = 0. Records 31 days to 90 days are partition_id = 1 and everything else is in partition_id = 2.

All queries will 'know' which partition_id they want to use. Within that, queries are always by sensor_id, badge_id, etc (see indexes) all the sensor_ids or badge_id within a group ie sensor_id in ( 3, 15, 35, 100, 1024) etc.

Here's the table definition

    CREATE TABLE 'device_messages' (
    'id' int(10) unsigned NOT NULL AUTO_INCREMENT,
    'partition_id' tinyint(3) unsigned NOT NULL DEFAULT '0',
    'customer_id' int(10) unsigned NOT NULL,
    'unix_timestamp' double(12, 2) NOT NULL,
    'timestamp' timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    'timezone_id' smallint(5) unsigned NOT NULL,
    'event_date' date NOT NULL,
    'is_day_shift' tinyint(1) unsigned NOT NULL,
    'msg_id' tinyint(3) unsigned NOT NULL,
    'sensor_id' int(10) unsigned NOT NULL,
    'sensor_role_id' int(10) unsigned NOT NULL,
    'sensor_box_build_id' int(10) unsigned NOT NULL,
    'gateway_id' int(10) unsigned NOT NULL,
    'location_hierarchy_id' int(10) unsigned NOT NULL,
    'group_hierarchy_id' int(10) unsigned DEFAULT NULL,
    'badge_id' int(10) unsigned NOT NULL,
    'is_badge_deleted' tinyint(1) DEFAULT NULL,
    'user_id' int(10) unsigned DEFAULT NULL,
    'is_user_deleted' tinyint(1) DEFAULT NULL,
    'badge_battery' double unsigned DEFAULT NULL,
    'scan_duration' int(10) unsigned DEFAULT NULL,
    'reading_count' tinyint(3) unsigned DEFAULT NULL,
    'median_rssi_reading' tinyint(4) DEFAULT NULL,
    'powerup_counter' int(10) unsigned DEFAULT NULL,
    'tx_counter' int(10) unsigned DEFAULT NULL,
    'activity_counter' int(10) unsigned DEFAULT NULL,
    'still_counter' int(10) unsigned DEFAULT NULL,
    'created_at' timestamp NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY ('id', 'partition_id', 'sensor_id', 'event_date'),
    KEY 'sensor_id_query_index' ('partition_id', 'sensor_id', 'event_date'),
    KEY 'badge_id_query_index' ('partition_id', 'badge_id', 'event_date'),
    KEY 'location_hierarchy_id_query_index' ('partition_id', 'location_hierarchy_id', 'event_date'),
    KEY 'group_hierarchy_id_query_index' ('partition_id', 'group_hierarchy_id', 'event_date')
    ) ENGINE = InnoDB AUTO_INCREMENT = 1 DEFAULT CHARSET = utf8 COLLATE = utf8_unicode_ci
    PARTITION BY RANGE (partition_id)
    SUBPARTITION BY HASH (sensor_id)
    (PARTITION fresh VALUES LESS THAN (1)
    (SUBPARTITION f0 ENGINE = InnoDB,
    SUBPARTITION f1 ENGINE = InnoDB,
    SUBPARTITION f2 ENGINE = InnoDB,
    SUBPARTITION f3 ENGINE = InnoDB,
    SUBPARTITION f4 ENGINE = InnoDB,
    SUBPARTITION f5 ENGINE = InnoDB,
    SUBPARTITION f6 ENGINE = InnoDB,
    SUBPARTITION f7 ENGINE = InnoDB,
    SUBPARTITION f8 ENGINE = InnoDB,
    SUBPARTITION f9 ENGINE = InnoDB),
    PARTITION archive VALUES LESS THAN (2)
    (SUBPARTITION a0 ENGINE = InnoDB,
    SUBPARTITION a1 ENGINE = InnoDB,
    SUBPARTITION a2 ENGINE = InnoDB,
    SUBPARTITION a3 ENGINE = InnoDB,
    SUBPARTITION a4 ENGINE = InnoDB,
    SUBPARTITION a5 ENGINE = InnoDB,
    SUBPARTITION a6 ENGINE = InnoDB,
    SUBPARTITION a7 ENGINE = InnoDB,
    SUBPARTITION a8 ENGINE = InnoDB,
    SUBPARTITION a9 ENGINE = InnoDB),
    PARTITION deep_archive VALUES LESS THAN MAXVALUE
    (SUBPARTITION C0 ENGINE = InnoDB,
    SUBPARTITION C1 ENGINE = InnoDB,
    SUBPARTITION C2 ENGINE = InnoDB,
    SUBPARTITION C3 ENGINE = InnoDB,
    SUBPARTITION C4 ENGINE = InnoDB,
    SUBPARTITION C5 ENGINE = InnoDB,
    SUBPARTITION C6 ENGINE = InnoDB,
    SUBPARTITION C7 ENGINE = InnoDB,
    SUBPARTITION C8 ENGINE = InnoDB,
    SUBPARTITION C9 ENGINE = InnoDB)) ;

This table definition is currently working with 16 million rows of data and queries seem to be fast. However, I'm concerned about the long term sustainability of this implementation. Plus I now see that we are doing a lot of churn on the partitions as we 'age' the records by updating the partition_id of 10s of thousands of records per week.

The queries will almost always be a variant of this:

    SELECT * FROM device_messages
    WHERE partition_id = 0
      AND 'event_date' BETWEEN '2019-08-07' AND '2019-08-13'
      AND 'sensor_id' in ( 3317, 3322, 3323, 3327, 3328, 3329, 3331, 3332, 3333, 3334, 3335, 3336, 3337, 3338, 3339, 3340, 3341, 3342 )
      ORDER BY 'unix_timestamp' asc

There could be as few as one sensor_id in the list but often will be several.

I've spent hours of time researching partitioning but haven't found an example or discussion of partitioning for exactly this use case. Since, we're using the artificial aging column of partition_id in this way I also realize that I can't do any true manipulation of the partitions, so I think I'm losing at least some of the value of partitioning.

Advice on partitioning schemes or even alternative approaches would be greatly appreciated.

Answer 1

PARTITIONing is not a performance panacea.

Not deleting? OK, the main use ( DROP PARTITION is faster than DELETE ) is not available.

Summary Tables is the answer to Data Warehouse performance problems. See http://mysql.rjweb.org/doc.php/summarytables

(Now I will read the Question in detail and any answers; maybe I will come back in have something to change.)

Schema critique

Since you anticipate millions of rows, shrinking datatypes is rather important.

customer_id is a 4-byte integer. If you don't anticipate more than a few thousand, use a 2-byte SMALLINT UNSIGNED . See also MEDIUMINT UNSIGNED . Ditto for all the other INTs .

'unix_timestamp' double(12, 2) is quite strange. What's wrong with TIMESTAMP(2) , which would be smaller?

'badge_battery' double -- Excessive resolution? DOUBLE is 8 bytes; FLOAT is 4 and has ~7 signficant digits.

Most columns are NULLable . Are they really optional? ( NULL has a tiny overhead; use NOT NULL where practical.)

When rows age out of being "fresh", will you do a massive UPDATE to change that column? Please consider the large impact that statement will have. It is better to create new partitions and change the queries. This works especially well if you have AND some_date > some_column and that column is PARTITION BY RANGE(TO_DAYS(..)) .

I have yet to see a justification for SUBPARTITIONing .

Non-partition

Given that this is typical:

SELECT * FROM device_messages
WHERE partition_id = 0
  AND 'event_date' BETWEEN '2019-08-07' AND '2019-08-13'
  AND 'sensor_id' in ( 3317, 3322, 3323, 3327, 3328, 3329, 3331, 3332,
                       3333, 3334, 3335, 3336, 3337, 3338, 3339, 3340, 3341, 3342 )
  ORDER BY 'unix_timestamp' asc

I would suggest the following:

No partitioning (and no partition_key )
Toss event_date ; use unix_timestamp instead
Change the select as follows:

...

SELECT * FROM device_messages
WHERE `unix_timestamp` >= '2019-08-07'
  AND `unix_timestamp`  < '2019-08-07' + INTERVAL 1 WEEK
  AND sensor_id in ( 3317, 3322, 3323, 3327, 3328, 3329, 3331, 3332,
                     3333, 3334, 3335, 3336, 3337, 3338, 3339, 3340, 3341, 3342 )
  ORDER BY `unix_timestamp` asc

And add

INDEX(sensor_id, `unix_timestamp`)

The, I think the following will be the processing. (Note: It may be worse than this in some older versions of MySQL/MariaDB.)

Drill down the BTree for the new index to [3317, '2019-08-07']
Scan forward (collecting rows into a temp) for the week
Repeate 1,2 for each other sensor_id.
Sort the temp table (to satisfy the ORDER BY ).
Deliver result rows.

The key point here is that it reads only exactly the rows that need to be delivered (plus one extra row per sensor to realize the week is over). Since this is a huge table, this is as good as it gets

The extra sort (cf Explain's "filesort") is necessary because there is no way to fetch the rows in ORDER BY order.

There is still another optimization...

In the above, the index was in order, but the data was not. We can fix that as follows:

PRIMARY KEY(sensor_id, `unix_timestamp`, id),  -- (`id` adds uniqueness)
INDEX(id),   -- to keep AUTO_INCREMENT happy

(and skip my previous index suggestion)

This modification will become especially beneficial if the table becomes bigger than the buffer_pool. This is because of the "clustering" provided by the revised PK.

More Normalization

I suspect that many of those ~30 columns are identical from row to row, especially for the same sensor (aka 'device'?). If I am correct, then you 'should' remove those columns from this huge table and put them into another table, de-dupped.

This would save even more space than tweaking INTs, etc.

Summary Table

Again, using your query, let's discuss what summary table would be useful. But first, I don't see what would be useful to summarize. I would expect to see a device_value FLOAT or something like that. I'll use that as a hypothetical example:

CREATE TABLE Summary (
        event_date DATE NOT NULL, -- reconstructed from `unix_timestamp`
        sensor_id ...,
        ct SMALLINT UNSIGNED,  -- number of readings for the day
        sum_value FLOAT NOT NULL,  -- SUM(device_value)
        sum2  -- if you need standard deviation
        min_value, etc   -- if you want those
        PRIMARY KEY(sensor_id, event_date)
    ) ENGINE=InnoDB;

Once a day:

INSERT INTO Summary (sensor_id, event_date, ct, sum_value, ...)
        SELECT sensor_id, DATE(`unix_timestamp`),
                          COUNT(*), SUM(device_value), ...
            FROM device_messages
            WHERE `unix_timestamp` >= CURDATE() - INTERVAL 1 DAY
             AND `unix_timestamp`  < CURDATE()
           GROUP BY sensor_id;

(There are more robust ways; there are more timely ways; etc.) Or you may want to summarize by hour instead of day. In any case, you can get arbitrary date range by summing the sums from daily summaries.

 Average:  SUM(sum_value) / SUM(ct)

Reduncancy?

unix_timestamp , timestamp , event_date , created_at -- all have the "same" value and meaning??

A note on DATE -- it is almost always easier to pick apart a DATETIME or TIMESTAMP than to have an extra column, and especially than having both DATE and TIME .

Without a date column, checking for all readings for one day needs to look something like:

    WHERE `dt` >= '2019-08-07'
      AND `dt`  < '2019-08-07' + INTERVAL 1 DAY

Approach to partitioning a large MySQL InnoDB table

Question

1 answers

solution1
1 ACCPTED 2019-08-13 19:23:43

Approach to partitioning a large MySQL InnoDB table

Question

1 answers

solution1 1 ACCPTED 2019-08-13 19:23:43

solution1
1 ACCPTED 2019-08-13 19:23:43