简体   繁体   中英

Understanding the mechanism of crush rule in ceph

I would like to know the difference between these 2 rules:

# rules
rule rack_rule{
 ruleset 0
 type replicated
 min_size 1
 max_size 10
 step take default
 step chooseleaf firstn 0 type rack
 step emit
}

and

rule 2rack_2host{
 ruleset 0
 type replicated
 min_size 1
 max_size 10
 step take default
 step choose firstn 2 type rack
 step chooseleaf firstn 2 type host
 step emit
}

In my understanding, the first rule rack_rule will take rack as failure domain as a result in every PG, we will have osds from different racks. So for example, if I have 2 racks and replication size = 2 I will have a PG [osd.1,osd.2] and these 2 osds should be from different racks.

In the second rule, I think it should select 2 different racks and for each rack it will select 2 different hosts. So, also if I have 2 racks and replication size = 2 I will have a PG [osd.1,osd.2] and these 2 osds should be from different racks.

This is theoritically, what I understood, but I don't see these expected results on practice. With these two rules, I have osds in the same rack for a PG inside a pool with replication size 2

Your conclusion is not entirely correct. The first rule

step take default
step chooseleaf firstn 0 type rack

you did understand correctly. Ceph will choose as many racks (underneath the "default" root in the crush tree) as your size parameter for the pool defines. The second rule works a little different:

step take default
step choose firstn 2 type rack
step chooseleaf firstn 2 type host

Ceph will select exactly 2 racks underneath root "default", in each rack it then will choose 2 hosts. But this rule is designed for size = 4 not 2. By the way, don't use size = 2 , If you use this rule with size 2 you'll end up exactly as you already wrote. two hosts in the same rack will have both PGs. So if one rack fails your PGs will become inactive and clients will encounter I/O errors until this resolves.

There's a tool called crushtool to test your changes before actually implementing it, it's very helpful, try it out!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM