简体   繁体   English

健康:HEALTH_ERR - 如何在不丢失数据的情况下修复它?

[英]health: HEALTH_ERR - how to fix it without losing data?

got ceph status:获得 ceph 状态:

# ceph status
  cluster:
    id:     b683c5f1-fd15-4805-83c0-add6fbb7faae
    health: HEALTH_ERR
            1 backfillfull osd(s)
            8 pool(s) backfillfull
            50873/1090116 objects misplaced (4.667%)
            Degraded data redundancy: 34149/1090116 objects degraded (3.133%), 3 pgs degraded, 3 pgs undersized
            Degraded data redundancy (low space): 6 pgs backfill_toofull

  services:
    mon: 3 daemons, quorum tb-ceph-2-prod,tb-ceph-4-prod,tb-ceph-3-prod
    mgr: tb-ceph-1-prod(active)
    osd: 6 osds: 6 up, 6 in; 6 remapped pgs
    rgw: 4 daemons active

  data:
    pools:   8 pools, 232 pgs
    objects: 545.1 k objects, 153 GiB
    usage:   728 GiB used, 507 GiB / 1.2 TiB avail
    pgs:     34149/1090116 objects degraded (3.133%)
             50873/1090116 objects misplaced (4.667%)
             226 active+clean
             3   active+undersized+degraded+remapped+backfill_toofull
             3   active+remapped+backfill_toofull

  io:
    client:   286 KiB/s rd, 2 op/s rd, 0 op/s wr

Here is OSD statuses:这是 OSD 状态:

# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS
 2   hdd 0.09769  1.00000 100 GiB  32 GiB  68 GiB 32.38 0.55  30
 5   hdd 0.32230  1.00000 330 GiB 220 GiB 110 GiB 66.71 1.13 122
 0   hdd 0.32230  1.00000 330 GiB 194 GiB 136 GiB 58.90 1.00 125
 1   hdd 0.04390  0.95001  45 GiB  43 GiB 2.5 GiB 94.53 1.60  11
 3   hdd 0.09769  1.00000 100 GiB  42 GiB  58 GiB 42.37 0.72  44
 4   hdd 0.32230  0.95001 330 GiB 196 GiB 134 GiB 59.43 1.01 129
                    TOTAL 1.2 TiB 728 GiB 507 GiB 58.94
MIN/MAX VAR: 0.55/1.60  STDDEV: 19.50

I have tried these commands:我试过这些命令:

 ceph osd pool set default.rgw.buckets.data pg_num 32
 ceph osd pool set default.rgw.buckets.data pgp_num 32

But it didn't help either.但它也没有帮助。 I think pg_num 32 is too small for my OSD count, but not sure if it's safe to set it bigger while health status is err我认为 pg_num 32 对于我的 OSD 计数来说太小了,但不确定在健康状态错误的情况下将其设置得更大是否安全

Your OSD #1 is full.您的 OSD #1 已满。 The disk drive is fairly small and you should probably exchange it with a 100G drive like the other two you have in use.磁盘驱动器相当小,您可能应该将其更换为 100G 驱动器,就像您使用的其他两个驱动器一样。 To remedy the situation have a look at the Ceph control commands .要纠正这种情况,请查看Ceph 控制命令

The command ceph osd reweight-by-utilization will adjust the weight for overused OSDs and trigger rebalance of PGs.命令ceph osd reweight-by-utilization将调整过度使用的 OSD 的权重并触发 PG 的重新平衡。 See also this blog post describing this situation.另请参阅此博客文章描述这种情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ceph health命令返回失败 - ceph health command returns a failure CEPH HEALTH_WARN 降级的数据冗余:重新加权后 pgs 过小 - CEPH HEALTH_WARN Degraded data redundancy: pgs undersized after reweighting 添加RGW实例时Ceph状态为HEALTH_WARN - Ceph status HEALTH_WARN while adding an RGW Instance 单节点集群(minikube)上的 rook ceph 中出现 1 pg 尺寸过小的健康警告 - 1 pg undersized health warn in rook ceph on single node cluster(minikube) 为什么我的新 Ceph 集群状态从不显示“HEALTH_OK”? - Why my new Ceph cluster status never shows 'HEALTH_OK'? 多少OSD宕机,Ceph会丢失数据 - How many OSD are down, Ceph will lost the data 为什么 Ceph 在仍有可用存储空间时将状态变为 Err - Why Ceph turns status to Err when there is still available storage space 如何在 ceph 部署时修复 Ceph 错误 - How can I fix Ceph error while ceph-deploy 如何修复重启后挂起的 ceph 命令? - How can I fix ceph commands hanging after a reboot? 如何在 HA 集群中的集群节点出现故障后保持 proxmox VM / CT 启动并运行而没有 VM / CT 访问丢失甚至一秒钟? - How to keep proxmox VM / CT up and runnig after a cluster node goes down in HA Cluster without VM/CT access loss even one second?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM