简体   繁体   English

我可以更改现有Citus表上的分配方法吗?

[英]Can I change the distribution method on an existing Citus table?

During a migration from MySQL into a Citus cluster, I used the range distribution method. 在从MySQL迁移到Citus集群的过程中,我使用了range分配方法。 The migration is complete, but now I'd like to change the distribution method to hash . 迁移已完成,但现在我想将分发方法更改为hash

Is there a way to change the distribution method from range to hash for an existing table with data already in it? 有没有办法将分配方法从range更改为hash ,对于已包含数据的现有表?

I came up with the following procedure, but am not sure it's valid: 我想出了以下程序,但我不确定它是否有效:

  1. Update the minvalue and maxvalue columns of the pg_dist_shard table for all shards being changed 为更改的所有分片更新pg_dist_shard表的minvaluemaxvalue
  2. Update the shard storage type column of the pg_dist_partition table from r to h pg_dist_partition表的分片存储类型列从rh
  3. COMMIT;

That is a good question. 这是一个很好的问题。 Currently, Citus does not provide a direct way to change partition type of existing data. 目前,Citus没有提供更改现有数据的分区类型的直接方法。

In range partitioning, records are placed in shards according to their partition column value and shard min/max values. 在范围分区中,记录根据分区列值和分片最小/最大值放在分片中。 If a record x resides in shard y, then it means y.minvalue <= x.partition_column <= y.maxvalue . 如果记录x位于分片y中,则表示y.minvalue <= x.partition_column <= y.maxvalue

In hash partitioning, the partition column is hashed and records are routed according to this hashed value. 在散列分区中,对分区列进行哈希处理,并根据此散列值路由记录。 Therefore, min/max values you see in pg_dist_shard are the boundary values for the result of the hash function. 因此,您在pg_dist_shard中看到的最小/最大值是散列函数结果的边界值。 In this case y.minvalue <= hash(x.partition_column) <= y.maxvalue . 在这种情况下, y.minvalue <= hash(x.partition_column) <= y.maxvalue

Therefore, doing the changes you have mentioned would end up with an incorrect distribution. 因此,执行您提到的更改最终会导致错误的分发。 In order to switch from range partition to hash partition, the data should be re-distributed. 为了从范围分区切换到散列分区,应该重新分配数据。 To do that, I suggest reloading the data to an empty hash-partitioned table. 为此,我建议将数据重新加载到空的散列分区表中。

For more information, you can refer to Working with Distributed Tables and Hash Distribution sections of Citus Documentation. 有关更多信息,请参阅Citus文档的“ 使用分布式表哈希分布”部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM