简体   繁体   English

使用 Citus PostgresQL 需要注意哪些设置

[英]What are settings to lookout for with Citus PostgresQL

We are looking into using CitusDB.我们正在考虑使用 CitusDB。 After reading all the documentation we are not clear on some fundamentals.在阅读了所有文档后,我们对一些基础知识还不是很清楚。 Hoping somebody can give some directions.希望有人能指点迷津。

In Citus you specify a shard_count and a shard_max_size , these settings are set on the coordinator according to the docs (but weirdly can also be set on a node).shard_count ,您指定shard_countshard_max_size ,这些设置根据文档在协调器上设置(但奇怪的是也可以在节点上设置)。

What happens when you specify 1000 shards and distribute 10 tables with 100 clients?当您指定 1000 个分片并为 100 个客户端分配 10 个表时会发生什么?

  1. Does it create a shard for every table (users_1, users_2, shops_1, etc.) (so effectively using all 1000 shards.它是否为每个表(users_1、users_2、shops_1 等)创建一个分片(如此有效地使用所有 1000 个分片。

  2. If you would grow with another 100 clients, we already hit the 1000 limit, how are these tables partitioned?如果您再增加 100 个客户端,我们已经达到了 1000 个限制,这些表是如何分区的?

  3. The shard_max_size defaults to 1Gb. shard_max_size默认为 1Gb。 If a shard is > 1Gb a new shard is created, but what happens when the shard_count is already hit?如果分片大于 1Gb,则会创建一个新分片,但是当 shard_count 已经命中时会发生什么?

  4. Lastly, is it advisible to go for 3000 shards?最后,是否建议使用 3000 个分片? We read in the docs 128 is adviced for a saas.我们在文档中阅读了 128 建议使用 saas。 But this seams low if you have 100 clients * 10 tables.但是,如果您有 100 个客户 * 10 个表,这将接缝很低。 (I know it depends.. but..) (我知道这取决于...但是...)

Former Citus/current Microsoft employee here, chiming in with some advice.前 Citus/现任微软员工在这里,提出一些建议。

Citus shards are based on integer hash ranges of the distribution key. Citus 分片基于分布键的整数哈希范围。 When a row is inserted, the value of the distribution key is hashed, the planner looks up what shard was assigned the range of hash values that that key falls into, then looks up what worker the shard lives on, and then runs the insert on that worker.当插入一行时,分布键的值被散列,规划器查找分配给哪个分片该键落入的散列值范围,然后查找分片所在的工作者,然后运行插入那个工人。 This means that the customers are divided up across shards in a roughly even fashion, and when you add a new customer it'll just go into an existing shard.这意味着客户以大致均匀的方式分布在分片中,当您添加新客户时,它只会进入现有分片。

It is critically important that all distributed tables that you wish to join to each other have the same number of shards and that their distribution columns have the same type.至关重要的是,您希望相互连接的所有分布式表都具有相同数量的分片,并且它们的分布列具有相同的类型。 This lets us perform joins entirely on workers, which is awesome for performance.这让我们可以完全在工作线程上执行连接,这对性能来说非常棒。

If you've got a super big customer (100x as much data as your average customer is a decent heuristic), I'd use the tenant isolation features in advance to give them their own shard.如果您有一个超级大客户(数据量是普通客户的 100 倍是一个不错的启发),我会提前使用租户隔离功能为他们提供自己的分片。 This will make moving them to dedicated hardware much easier if you decide to do so down the road.如果您以后决定这样做,这将使它们更容易转移到专用硬件上。

The shard_max_size setting has no effect on hash distributed tables. shard_max_size设置对哈希分布式表没有影响。 Shards will grow without limit as you keep inserting data, and hash-distributed tables will never increase their shard count under normal operations.随着您不断插入数据,分片将无限制地增长,并且散列分布式表在正常操作下永远不会增加其分片数。 This setting only applies to append distribution, which is pretty rarely used these days (I can think of one or two companies using it, but that's about it).此设置仅适用于 append 分发,这几天很少使用(我可以想到一两家公司在使用它,但仅此而已)。

I'd strongly advise against changing the citus.shard_count to 3000 for the use case you've described.对于您描述的用例,我强烈建议不要将 citus.shard_count 更改为 3000。 64 or 128 is probably correct, and I'd consider 256 if you're looking at >100TB of data. 64 或 128 可能是正确的,如果您查看 > 100TB 的数据,我会考虑 256。 It's perfectly fine if you end up having thousands of distributed tables and each one has 128 shards, but it's better to keep the number of shards per table reasonable.如果您最终拥有数千个分布式表并且每个表有 128 个分片,那完全没问题,但最好保持每个表的分片数量合理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 带有citus扩展名的Postgresql分片不起作用 - Postgresql sharding with citus extension not working Citus 10.2 未启动 PostgreSQL 的实例 - Citus 10.2 not starting an instance of PostgreSQL Citus Sharding 和 PostgreSQL 表分区在同一列上 - Citus Sharding and PostgreSQL table partitioning on the same column 这是什么意思,以及如何解决citus错误? - What does this mean and how to fix citus error? 从 Azure CLI(不是 ARC)创建 PostgreSQL 超大规模 Citus DB - Create a PostgreSQL Hyperscale Citus DB from Azure CLI (not ARC) 添加工作节点并使用 PostgreSQL + Citus 重新平衡分布式分片问题 - add worker node and rebalance distributed shards problem with PostgreSQL + Citus PostgreSQL设置 - PostgreSQL settings 使用Navicat通过SSH登录Postgresql - 正确的设置是什么? - Using Navicat to login to Postgresql via SSH - what are the correct settings? PostgreSQL 添加新节点后的 Citus Shard Rebalancing -> 连接到远程节点 localhost:5432 失败并出现以下错误 - PostgreSQL Citus Shard Rebalancing after adding new node -> connection to the remote node localhost:5432 failed with the following error Citus PostgreSQL 单机集群到多租户数据库的多机集群 - Citus PostgreSQL single machine cluster to multi machine cluster for Multi-tenant Databse
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM