简体   繁体   中英

Would partitioning the table improve the performance of this GROUP BY query?

I have a MySQL table say data_table

mysql> desc data_table;
+------------+------------------+------+-----+---------+----------------+
| Field      | Type             | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| id         | int(11)          | NO   | PRI | NULL    | auto_increment |
| prod_id    | int(10) unsigned | NO   |     | NULL    |                |
| date       | date             | NO   |     | NULL    |                |
| cost       | double           | NO   |     | NULL    |                |
+------------+------------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

This table has around 700 million rows. I have created indexes on prod_id and date . I need to perform a query like this -

SELECT `id`, `prod_id`, WEEKOFYEAR(`date`) AS period, SUM(`cost`) AS cost_sum
FROM `data_table` GROUP BY `prod_id`, `period`;

My question is -

Will partitioning the table on months (~20 partitions) improve the performance of this query?

Based on the number of records and the SQL query you have written I would say yes, if done correctly Partitioning would help a lot. I would go further and suggest Range Partitioning on the Date field. This is a very common Partitioning method and works well and is easy to implement.

You don't mention the release of MySQL you're running so you'll have to do some additional reading HERE to understand what your MySQL release supports.

You can also run this SQL at the command prompt.

mysql> SHOW VARIABLES LIKE %partition%

This should report back with "have Partitioning = Yes" or "Partition_engine = yes" depending on your relase.

If you see that there are a lot of queries based on week number, it makes sense to permanently store the week number as a column. We can save on the calculation during select. The ideal strategy is to know what queries you will run and then design your tables accordingly.

PARTITIONing will not help at all. Not BY RANGE ; not any other flavor.

The query must read every row in the table; partitioning does not change that fact, nor can it speed it up at all.

The query, as it stands, has an unrelated problem. Which id is it supposed to return for each GROUP ? Answer: It will return a 'random' id .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM