简体   繁体   English

如何在MySQL中按日对MyISAM表进行分区

[英]How to partition a MyISAM table by day in MySQL

I want to keep the last 45 days of log data in a MySQL table for statistical reporting purposes. 我想将最近45天的日志数据保留在MySQL表中,以进行统计报告。 Each day could be 20-30 million rows. 每天可能有20-30百万行。 I'm planning on creating a flat file and using load data infile to get the data in there each day. 我计划创建一个平面文件,并每天使用加载数据infile来获取其中的数据。 Ideally I'd like to have each day on it's own partition without having to write a script to create a partition every day. 理想情况下,我希望每天都在自己的分区上,而不必每天编写脚本来创建分区。

Is there a way in MySQL to just say each day gets it's own partition automatically? MySQL中是否有一种方法可以说每天自动获取自己的分区?

thanks 谢谢

er.., number them in Mod 45 with a composite key and cycle through them... 嗯..,用组合键在Mod 45中对它们编号,并在它们之间循环...

Seriously 1 table per day was a valid suggestion, and since it is static data I would create packed MyISAM, depending upon my host's ability to sort. 认真地讲,每天1张桌子是一个有效的建议,由于它是静态数据,因此我将创建打包的MyISAM,具体取决于主机的排序能力。

Building queries to union some or all of them would be only moderately challenging. 建立查询以合并部分或全部查询只是一个中等挑战。

1 table per day, and partition those to improve load performance. 每天1张桌子,并对其进行分区以提高负载性能。

Yes, you can partition MySQL tables by date: 是的,您可以按日期对MySQL表进行分区:

CREATE TABLE ExampleTable (
  id INT AUTO_INCREMENT,
  d DATE,
  PRIMARY KEY (id, d)
) PARTITION BY RANGE COLUMNS(d) (
  PARTITION p1 VALUES LESS THAN ('2014-01-01'),
  PARTITION p2 VALUES LESS THAN ('2014-01-02'),
  PARTITION pN VALUES LESS THAN (MAXVALUE)
);

Later, when you get close to overflowing into partition pN , you can split it: 稍后,当您接近溢出到分区pN ,可以将其拆分:

ALTER TABLE ExampleTable REORGANIZE PARTITION pN INTO (
  PARTITION p3 VALUES LESS THAN ('2014-01-03'), 
  PARTITION pN VALUES LESS THAN (MAXVALUE)
);

This doesn't automatically partition by date, but you can reorganize when you need to. 这不会按日期自动分区,但是您可以在需要时进行重组。 Best to reorganize before you fill the last partition, so the operation will be quick. 最好在填充最后一个分区之前进行重组,这样操作会很快。

I have stumbled on this question while looking for something else and wanted to point out the MERGE storage engine ( http://dev.mysql.com/doc/refman/5.7/en/merge-storage-engine.html ). 我在寻找其他东西时偶然发现了这个问题,并想指出MERGE存储引擎( http://dev.mysql.com/doc/refman/5.7/en/merge-storage-engine.html )。

The MERGE storage is more or less a simple pointer to multiple tables, and can be redone in seconds. MERGE存储或多或少是指向多个表的简单指针,并​​且可以在几秒钟内重做。 For cycling logs, it can be very powerfull! 对于循环日志,它可能非常强大! Here's what I'd do: 这是我要做的:

Create one table per day, use LOAD DATA as OP mentionned to fill it up. 每天创建一张表,使用LOAD DATA as OP来填充它。 Once it is done, drop the MERGE table and recreate it including that new table while ommiting the oldest one. 完成后,删除MERGE表并重新创建它,包括新表,同时忽略最旧的表。 Once done, I could delete/archive the old table. 完成后,我可以删除/存档旧表。 This would allow me to rapidly query a specific day, or all as both the orignal tables and the MERGE are valid. 因为原始表和MERGE都有效,所以这将允许我快速查询特定的一天或全部。

CREATE TABLE logs_day_46 LIKE logs_day_45 ENGINE=MyISAM;
DROP TABLE IF EXISTS logs;
CREATE TABLE logs LIKE logs_day_46 ENGINE=MERGE UNION=(logs_day_2,[...],logs_day_46);
DROP TABLE logs_day_1;

Note that a MERGE table is not the same as a PARTIONNED one and offer some advantages and inconvenients. 请注意,MERGE表与PARTIONNED表不同,它具有一些优点和不便之处。 But do remember that if you are trying to aggregate from all tables it will be slower than if all data was in only one table (same is true for partitions, as they are basically different tables under the hood). 但是请记住,如果您尝试从所有表进行聚合,则比所有数据都在一个表中要慢(对于分区也是如此,因为分区本质上是不同的表)。 If you are going to query mostly on specific days, you will need to choose the table yourself, but if partitions are done on the day values, MySQL will automatically grab the correct table(s) which might come out faster and easier to write. 如果您主要在特定日期进行查询,则需要自行选择表,但是如果分区是按日期值进行的,MySQL将自动获取正确的表,这些表可能会更快,更容易编写。

I would strongly suggest using Redis or Cassandra rather than MySQL to store high traffic data such as logs. 我强烈建议使用Redis或Cassandra而不是MySQL来存储高流量数据,例如日志。 Then you could stream it all day long rather than doing daily imports. 然后,您可以整天流式传输而不是每天导入。

You can read more on those two (and more) in this comparison of "NoSQL" databases . “ NoSQL”数据库的比较中,您可以阅读更多关于这两个的内容。

If you insist on MySQL, I think the easiest would just be to create a new table per day, like logs_2011_01_13 and then load it all in there. 如果您坚持使用MySQL,我认为最简单的方法就是每天创建一个新表,例如logs_2011_01_13,然后将其全部加载到该表中。 It makes dropping older dates very easy and you could also easily move different tables on different servers. 这使得删除较早的日期非常容易,并且您还可以轻松地将不同的表移至不同的服务器上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM