MongoDB排放碎片但平衡器没有运行？（removeShard花了太多时间）

Question

我正在尝试将目前有8个分片的分片群集缩小到具有4个分片的群集。

我已经开始使用第8个碎片并尝试先将其删除。

db.adminCommand( { removeShard : "rs8" } );
----
{
    "msg" : "draining ongoing",
    "state" : "ongoing",
    "remaining" : {
        "chunks" : NumberLong(1575),
        "dbs" : NumberLong(0)
    },
    "note" : "you need to drop or movePrimary these databases",
    "dbsToMove" : [ ],
    "ok" : 1
}

因此，有1575个块可以迁移到群集的其余部分。

但运行sh.isBalancerRunning()我得到的值为false ， sh.status()的输出如下所示：

  ...
  ...

  active mongoses:
        "3.4.10" : 16
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
NaN
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                59 : Success
                1 : Failed with error 'aborted', from rs8 to rs1
                1 : Failed with error 'aborted', from rs2 to rs6
                1 : Failed with error 'aborted', from rs8 to rs5
                4929 : Failed with error 'aborted', from rs2 to rs7
                1 : Failed with error 'aborted', from rs8 to rs2
                506 : Failed with error 'aborted', from rs8 to rs7
                1 : Failed with error 'aborted', from rs2 to rs3
...

所以平衡器已启用，但未运行。 但是有一个排水碎片（rs8）被删除了，所以我认为平衡器应该不断运行，对吧？ 但事实并非如我上面提供的日志中所显示的那样。

此过程也花费了相当长的时间，在过去的近一天 ，剩余的块数仅减少了10个块，从1575到1565 ！ 这样，我需要几个月的时间才能将8个实例的分片集群缩减为4个实例的分片集群！

似乎MongoDB本身并没有停止对排水碎片的写入，所以我所经历的是块的增加率，可能几乎抵消了它们的减少？

任何帮助是极大的赞赏！
谢谢

Answer 1

编辑

很棒，现在一个月后，这个过程结束了，我有一个4个碎片集群！ 做我下面描述的技巧有助于减少它可能花费的时间，但老实说，这是我做过的最慢的事情。

好的，所以在这里回答我自己，

我无法让自动平衡行为以我想要的速度运行，每天我观察到的是大约5到7个块将被迁移（意味着整个过程需要数年！）

我所做的有点克服了这个问题，就是手动使用moveChunk命令。

所以我基本上做的是：

while 'can still sample':
    // Sample the 8th shard for 100 documents
    db.col.aggreagte([{$sample: {size: 100}}])

    For every document:
        db.moveChunk(namespace, {shardKey: value}, `rs${NUM}`);

所以我手动将块从第8个碎片移动到前4个碎片（一个缺点是因为我们需要启用平衡器，并且每次只有一个碎片可以耗尽，其中一些迁移块将再次自动迁移对于碎片5-7，我想在以后删除它，这会导致花费更多时间的过程，任何解决方案？）。

由于第8个碎片正在排空，因此不会再用平衡器填充，现在整个过程要快得多，每天大约350-400个碎片。 所以希望每个碎片最多需要5天左右，然后整个大小调整大约需要20天！

这是我能做到的最快的，我感谢任何有任何其他答案或策略的人更好地完成这种缩小。

MongoDB排放碎片但平衡器没有运行？（removeShard花了太多时间）

问题描述

1 个解决方案

解决方案1
0 2018-10-29 14:13:16

MongoDB排放碎片但平衡器没有运行？ （removeShard花了太多时间）

问题描述

1 个解决方案

解决方案1 0 2018-10-29 14:13:16

MongoDB排放碎片但平衡器没有运行？（removeShard花了太多时间）

解决方案1
0 2018-10-29 14:13:16