简体繁体 English

为什么 Ceph 用 object hash 而不是 CRUSH 算法计算 PG ID？

[英]Why Ceph calculate PG ID by object hash rather than CRUSH algorithm?

原文 2020-11-29 16:35:09 2 1 ceph/ cephfs/ radosgw/ crush

Ceph using CRUSH algorithm for PG->OSD mapping and it works fine for increasing/decreasing of OSD nodes. Ceph 使用 CRUSH 算法进行 PG->OSD 映射，它适用于增加/减少 OSD 节点。

But for obj->PG mapping, Ceph still uses the traditional hash, which is pgid = hash(obj_name) % pg_num .但是对于 obj->PG 映射，Ceph 还是使用传统的 hash，即pgid = hash(obj_name) % pg_num 。 This approach may lead to massive data migration if we change the number of PGs, even reduce the availability of the system.如果我们改变 PG 的数量，这种方法可能会导致大量的数据迁移，甚至会降低系统的可用性。

Why Ceph doesn't use CRUSH algirhtm (say straw2) for obj->PG mapping which could have optimal amount of data migration when the number of PGs is changed?为什么 Ceph 不使用 CRUSH algirhtm（比如稻草 2）进行 obj->PG 映射，当 PG 的数量发生变化时，它可能具有最佳的数据迁移量？

1 个解决方案

There are different scenarios and CRUSH is not a silver bullet I think.有不同的场景，我认为 CRUSH 不是灵丹妙药。

PG->OSD is a one-to-many function while obj->PG is a one-to-one function. PG->OSD 是一对多的 function 而 obj->PG 是一对一的 function。
Additions and deletions of OSD are fairly frequent, while PG is considered fairly stable. OSD 的添加和删除相当频繁，而 PG 被认为是相当稳定的。
The OSD group could be partially unavailable while PG will not. OSD 组可能部分不可用，而 PG 不会。

This is my perception, criticism or discussion is welcome.这是我的看法，欢迎批评或讨论。

ceph暗恋地图-复制 - ceph crush map - replication

Ceph - CRUSH 和故障域更改？ - Ceph - CRUSH and failure domain changes?

ceph pg ID查询挂起/卡住/不干净的PG - ceph pg ID query hangs/ stuck/unclean PG

了解ceph中crush规则的机制 - Understanding the mechanism of crush rule in ceph

在共享块设备而不是专用块设备上安装 Ceph - Installing Ceph on a shared block device rather than a dedicated block device

ceph pg 修复不会立即开始 - ceph pg repair doesnt start right away

在Ceph对象存储集群安装后如何创建Ceph文件系统？ - How to create Ceph Filesystem after Ceph Object Storage Cluster Setup?

有人可以解释Ceph CRUSH地图中的奇怪剩余OSD设备 - 从osd.N重命名为deviceN吗？ - Can someone explain the strange leftover OSD devices in the Ceph CRUSH map — renamed from osd.N to deviceN?

单节点集群（minikube）上的 rook ceph 中出现 1 pg 尺寸过小的健康警告 - 1 pg undersized health warn in rook ceph on single node cluster(minikube)

为什么我在global section中使用的raw的ceph集群值（964G）远高于在pools sectio中使用的值（244G） - Why is my ceph cluster value(964G) of raw used in global secion far higher than that(244G) of used in pools sectio

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ceph暗恋地图-复制 - ceph crush map - replication Ceph - CRUSH 和故障域更改？ - Ceph - CRUSH and failure domain changes? ceph pg ID查询挂起/卡住/不干净的PG - ceph pg ID query hangs/ stuck/unclean PG 了解ceph中crush规则的机制 - Understanding the mechanism of crush rule in ceph 在共享块设备而不是专用块设备上安装 Ceph - Installing Ceph on a shared block device rather than a dedicated block device ceph pg 修复不会立即开始 - ceph pg repair doesnt start right away 在Ceph对象存储集群安装后如何创建Ceph文件系统？ - How to create Ceph Filesystem after Ceph Object Storage Cluster Setup? 有人可以解释Ceph CRUSH地图中的奇怪剩余OSD设备 - 从osd.N重命名为deviceN吗？ - Can someone explain the strange leftover OSD devices in the Ceph CRUSH map — renamed from osd.N to deviceN? 单节点集群（minikube）上的 rook ceph 中出现 1 pg 尺寸过小的健康警告 - 1 pg undersized health warn in rook ceph on single node cluster(minikube) 为什么我在global section中使用的raw的ceph集群值（964G）远高于在pools sectio中使用的值（244G） - Why is my ceph cluster value(964G) of raw used in global secion far higher than that(244G) of used in pools sectio

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM