简体   繁体   English

将所有分区用于MySQL中基本SELECT语句的分区表

[英]Partitioned table using all partitions for basic SELECT statement in MySQL

I have a table in MySQL partitioned by HASH on the function of year(date) . 我在MySQL中有一个按year(date)函数由HASH分区的表。 The goal is to distribute my data into a partition for each year more or less. 目标是每年或多或少地将我的数据分发到一个分区中。

When executing a basic select statement: 执行基本的select语句时:

EXPLAIN PARTITIONS
SELECT date 
FROM date_table 
WHERE date >= '2008-01-01' AND date <= '2009-01-01'

...all partitions are being used. ...所有分区都在使用中。 I would assume that only some of the partitions would be used, at max, 2. What am I missing here regarding how partitions work? 我假设最多只使用2个分区。在这里,关于分区如何工作我缺少什么?

test.sql test.sql

DROP TABLE IF EXISTS `tmp_date_table`;

CREATE TABLE `tmp_date_table` (
    `date_id` INT(11) NOT NULL,
    `date` DATE NOT NULL,
    PRIMARY KEY (`date_id`, `date`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
PARTITION BY HASH (year(date))
PARTITIONS 11 
;

INSERT INTO `tmp_date_table`(date_id, date) 
VALUES
(1, '2000-01-01'),
(2, '2001-01-01'),
(3, '2002-01-01'),
(4, '2003-01-01'),
(5, '2004-01-01'),
(6, '2005-01-01'),
(7, '2006-01-01'),
(8, '2007-01-01'),
(9, '2008-01-01'),
(10, '2009-01-01'),
(11, '2010-01-01');

EXPLAIN PARTITIONS
SELECT date FROM tmp_date_table WHERE date >= '2008-01-01' AND date <= '2009-01-01';

DROP TABLE IF EXISTS `tmp_date_table`;

Any help is appreciated. 任何帮助表示赞赏。

So it looks like you are setting up correctly, I digged a little deeper. 所以看起来您设置正确,我进行了更深入的研究。

http://dev.mysql.com/doc/refman/5.7/en/partitioning-pruning.html http://dev.mysql.com/doc/refman/5.7/zh-CN/partitioning-pruning.html

When a table is partitioned by HASH or [LINEAR] KEY, pruning can be used only on integer columns. 当用HASH或[LINEAR]键对表进行分区时,修剪只能在整数列上使用。 For example, this statement cannot use pruning because dob is a DATE column: 例如,此语句不能使用修剪,因为dob是DATE列:

SELECT * FROM t4 WHERE dob >= '2001-04-14' AND dob <= '2005-10-15';

So you can't do what you are doing with HASH . 因此,您无法使用HASH

However, if the table stores year values in an INT column, then a query having WHERE year_col >= 2001 AND year_col <= 2005 can be pruned. 但是,如果表将年份值存储在INT列中,则可以删除WHERE year_col> = 2001 AND year_col <= 2005的查询。

That seems counter intuitive to me, but part of the deal is that you have to always have to specify the amount of partitions up front (in your case, 11), so the partition is calculated thusly: 这似乎直觉上我,但交易的一部分,你必须总是要指定前面分区的数量(在你的情况下,11),那么该分区正是如此计算:

If you insert a record into t1 whose col3 value is '2005-09-15', then the partition in which it is stored is determined as follows: 如果将记录插入t1的col3值为'2005-09-15',则存储该记录的分区将确定如下:

MOD(YEAR('2010-09-01'),11)
=  MOD(2010,11)
=  8

So that will go into partition 8 rather than partition 11, which means: 因此它将进入分区8而不是分区11,这意味着:

MOD(YEAR('2000-09-01'),11)
=  MOD(2000,11)
=  9

Your first year would go into partition 9. It would use the correct partition if you queried on the date alone: 您的第一年将进入分区9。如果仅查询日期,它将使用正确的分区:

WHERE date = "2010-01-01"

But not on a range. 但不在一定范围内。

Since the range of your data is known, and it all looks historical, you will have to bite the bullet and set up a range for each year. 由于您的数据范围是已知的,并且看起来都是历史性的,因此您将不得不硬着头皮设定每年的范围。 This way however, your range query will use only the correct partitions when you use a BETWEEN . 但是,通过这种方式,当您使用BETWEEN时,范围查询将仅使用正确的分区。

DROP TABLE IF EXISTS `tmp_date_table`;

CREATE TABLE `tmp_date_table` (
    `date_id` INT(11) NOT NULL,
    `dates` DATE NOT NULL
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
PARTITION BY RANGE ( YEAR(`dates`) ) (
    PARTITION p0 VALUES LESS THAN (2001),
    PARTITION p1 VALUES LESS THAN (2002),
    PARTITION p2 VALUES LESS THAN (2003),
    PARTITION p3 VALUES LESS THAN (2004),
    PARTITION p4 VALUES LESS THAN (2005),
    PARTITION p5 VALUES LESS THAN (2006),
    PARTITION p6 VALUES LESS THAN (2007),
    PARTITION p7 VALUES LESS THAN (2009),
    PARTITION p8 VALUES LESS THAN (2010),
    PARTITION p9 VALUES LESS THAN (2011),
    PARTITION p10 VALUES LESS THAN MAXVALUE

);

INSERT INTO `tmp_date_table`(date_id, `dates`) 
VALUES
(1, '2000-01-01'),
(2, '2001-01-01'),
(3, '2002-01-01'),
(4, '2003-01-01'),
(5, '2004-01-01'),
(6, '2005-01-01'),
(7, '2006-01-01'),
(8, '2007-01-01'),
(9, '2008-01-01'),
(10, '2009-01-01'),
(11, '2010-01-01'),
(12, '2012-01-01');



EXPLAIN PARTITIONS
SELECT dates FROM tmp_date_table WHERE (`dates`) BETWEEN  "2001-01-01" and "2004-01-01" ;

DROP TABLE IF EXISTS `tmp_date_table`;

You have found a major reason why PARTITION BY HASH is virtually useless. 您已经找到了造成PARTITION BY HASH实际上无用的主要原因。

But, more basic... WHY do this? 但是,更基本...为什么要这样做?

CREATE TABLE `tmp_date_table` (
    `date_id` INT(11) NOT NULL,
    `date` DATE NOT NULL,
    PRIMARY KEY (`date_id`, `date`)
)

Are you trying to 'normalize' dates to date_id's? 您是否正在尝试将日期“标准化”为date_id?

  1. date_id is INT which occupies 4 bytes. date_idINT ,占4个字节。 DATE occupies only 3 bytes. DATE仅占用3个字节。 So this normalization wastes space. 因此,这种标准化浪费了空间。

  2. Don't normalize "continuous" things such as number, dates, floats, etc. It prevents you from efficiently looking up "ranges" of such values. 不要规范“连续”的事物,例如数字,日期,浮点数等。它会阻止您有效地查找这些值的“范围”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM