简体   繁体   English

mysql零停机ALTER表排序(percona,openark等)

[英]mysql zero downtime ALTER table sorting (percona, openark, etc)

I need to sort a huge MyISAM table in a certain way so that SELECTs are faster under certain conditions. 我需要以某种方式对巨大的MyISAM表进行排序,以便在某些条件下SELECT更快。

Please note that this question is about how to do an ALTER table with zero downtime for SORTING the table in some specific column order. 请注意,这个问题是关于如何在停机时间为零的情况下创建ALTER表,以便按某些特定的列顺序对表进行排序。 It is not a dup question of other questions asking about the more general case. 这不是关于更一般情况的其他问题的重复问题。

A simple way to achieve this is doing something like this: 一种简单的方法是执行以下操作:

ALTER table mytable ORDER BY col1, col2;

We may also use myisamchk --sort-records to achieve the same result. 我们也可以使用myisamchk --sort-records获得相同的结果。

In any case, both approaches let us to do very quick: 无论如何,两种方法都可以让我们快速完成:

SELECT * WHERE col1=x order by col2;

Note that this is not a problem with the index but with fetching large amounts of ordered data from the table. 注意,这不是索引问题,而是从表中获取大量有序数据。

So far that ALTER has been working well. 到目前为止,ALTER运行良好。 The problem now is that the ALTER command is slow and it locks the DB. 现在的问题是,ALTER命令运行缓慢,并且锁定了数据库。

I believe we may use percona or openark tools for doing the same operation. 我相信我们可以使用percona或openark工具执行相同的操作。 Something like this: 像这样:

pt-online-schema-change --alter "ENGINE=MyISAM, ORDER BY col1, col2" D=mydatabase,t=mytable -u root --dry-run

This internally creates a new table copies it and then moves names. 这将在内部创建一个新表并将其复制,然后移动名称。 It is pretty well documented. 这是有据可查的。

However I'm not sure if/how percona will honour the "ORDER BY". 但是我不确定percona是否/如何兑现“ ORDER BY”。 I cannot see anything happening in the dry-run logs (but this may be normal). 我看不到空运行日志中发生的任何事情(但这可能是正常的)。 And this is not explained in the documentation. 文档中没有对此进行说明。

Does anyone know how will percona ORDER BY the table? 有谁知道如何percona ORDER BY表?

  1. Will it do the ordering on the new table (_mytable_new) after mytable is copied and before renaming? 复制mytable之后并重命名之前,它会在新表(_mytable_new)上进行排序吗?
  2. Will it do the ordering during the copy from mytable as in "INSERT INTO _mytable_new SELECT * FROM mytable ORDER BY col1, col2"? 它会在从mytable复制期间进行排序,如“在INERT _mytable_new SELECT * FROM mytable ORDER BY col1,col2中插入”一样吗?
  3. Or perhaps "ORDER BY" will never be done? 或者也许“ ORDER BY”永远不会完成?

EDIT: I launched PTDEBUG=1 ./pt-online-schema-change --alter "ENGINE=MyISAM, ORDER BY col1, col2" on the tests server. 编辑:我在测试服务器上启动了PTDEBUG=1 ./pt-online-schema-change --alter "ENGINE=MyISAM, ORDER BY col1, col2"

After checking the logs I found out that "ORDER BY" is not being applied... Any ideas? 检查日志后,我发现未应用“ ORDER BY” ...有什么想法吗? Does openark permit to do so? openark允许这样做吗?

Thanks! 谢谢!

There's a way to achieve an ORDER BY with pt-online-schema-change. 有一种方法可以通过pt-online-schema-change实现ORDER BY。

First, make sure you have an index on the column you want to ORDER BY. 首先,请确保您要ORDER BY的列上有一个索引。 Then use percona tool with option "--chunk-index" so that the chosen index is used to fetch the rows in the original table. 然后将percona工具与选项“ --chunk-index”一起使用,以便使用所选的索引来获取原始表中的行。

There's a problem though. 不过有一个问题。 Percona won't be able to fetch rows when indices have poor selectivity. 当索引的选择性差时,Percona将无法获取行。 In that case, create a composite index using the column you need to sort + ID for example (or any other column with high cardinality). 在这种情况下,请使用您需要排序的列+ ID(例如,其他具有高基数的列)来创建一个复合索引。 Will be slow but may be a way yo get online rows sorted. 速度会很慢,但可能是您获取在线行排序的一种方式。

I got a boost of 10x with a large table of 100M rows which was very fragmented. 大型的100M行表分散了我10倍的内存。 OPTIMIZE table without column sorting did not improve the situation since values where randomly distributed in a table of 8GB. 没有列排序的OPTIMIZE表不能改善这种情况,因为值随机分布在8GB的表中。 I hope this finding helps others. 我希望这一发现对其他人有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM