简体   繁体   English

为什么 Ceph 用 object hash 而不是 CRUSH 算法计算 PG ID?

[英]Why Ceph calculate PG ID by object hash rather than CRUSH algorithm?

Ceph using CRUSH algorithm for PG->OSD mapping and it works fine for increasing/decreasing of OSD nodes. Ceph 使用 CRUSH 算法进行 PG->OSD 映射,它适用于增加/减少 OSD 节点。

But for obj->PG mapping, Ceph still uses the traditional hash, which is pgid = hash(obj_name) % pg_num .但是对于 obj->PG 映射,Ceph 还是使用传统的 hash,即pgid = hash(obj_name) % pg_num This approach may lead to massive data migration if we change the number of PGs, even reduce the availability of the system.如果我们改变 PG 的数量,这种方法可能会导致大量的数据迁移,甚至会降低系统的可用性。

Why Ceph doesn't use CRUSH algirhtm (say straw2) for obj->PG mapping which could have optimal amount of data migration when the number of PGs is changed?为什么 Ceph 不使用 CRUSH algirhtm(比如稻草 2)进行 obj->PG 映射,当 PG 的数量发生变化时,它可能具有最佳的数据迁移量?

There are different scenarios and CRUSH is not a silver bullet I think.有不同的场景,我认为 CRUSH 不是灵丹妙药。

  1. PG->OSD is a one-to-many function while obj->PG is a one-to-one function. PG->OSD 是一对多的 function 而 obj->PG 是一对一的 function。
  2. Additions and deletions of OSD are fairly frequent, while PG is considered fairly stable. OSD 的添加和删除相当频繁,而 PG 被认为是相当稳定的。
  3. The OSD group could be partially unavailable while PG will not. OSD 组可能部分不可用,而 PG 不会。

This is my perception, criticism or discussion is welcome.这是我的看法,欢迎批评或讨论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM