AWS 上的 Elasticsearch：如何修复未分配的分片？

Question

我在 AWS Elasticsearch 上有一个索引，由于NODE_LEFT没有被评估。 这是_cat/shards的 output

rawindex-2017.07.04                     1 p STARTED    
rawindex-2017.07.04                     3 p UNASSIGNED NODE_LEFT
rawindex-2017.07.04                     2 p STARTED    
rawindex-2017.07.04                     4 p STARTED    
rawindex-2017.07.04                     0 p STARTED

在正常情况下，使用_cluster或_settings很容易重新分配这些分片。 但是，这些正是 AWS 不允许的 API。 我收到以下消息：

{
    Message: "Your request: '/_settings' is not allowed."
}

根据对一个非常相似的问题的回答，我可以使用 AWS 允许的_index API 更改索引的设置。 但是，似乎index.routing.allocation.disable_allocation对我正在运行的 Elasticsearch 5.x 无效。 我收到以下错误：

{
    "error": {
        "root_cause": [
            {
                "type": "remote_transport_exception",
                "reason": "[enweggf][x.x.x.x:9300][indices:admin/settings/update]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "unknown setting [index.routing.allocation.disable_allocation] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
    },
    "status": 400
}

我尝试使用高index.priority优先考虑索引恢复， index.unassigned.node_left.delayed_timeout设置为 1 分钟，但我无法重新分配它们。

有什么方法（肮脏或优雅）可以在 AWS 托管 ES 上实现这一点？

谢谢！

Answer 1

我在 AWS Elasticsearch 6.3 版中遇到了类似的问题，即无法分配 2 个分片，并且集群状态为 RED。 运行GET _cluster/allocation/explain显示原因是他们超过了默认的最大分配重试次数 5。

运行查询GET <my-index-name>/_settings显示了每个索引可以更改的少数设置。 请注意，如果您使用 AWS Elasticsearch 服务，则所有查询均采用 Kibana 格式，您可以使用这种格式。 以下解决了我的问题：

PUT <my-index-name>/_settings
{
  "index.allocation.max_retries": 6
}

之后立即运行GET _cluster/allocation/explain返回一个错误，其中包含以下内容： "reason": "unable to find any unassigned shards to explain..." ，一段时间后问题得到解决。

Answer 2

当其他解决方案失败时，可能会有替代解决方案。 如果您在 AWS 上有一个托管的 Elasticsearch 实例，那么您很可能“只是”恢复一个快照。

检查失败的索引。

您可以用于例如：

curl -X GET "https://<es-endpoint>/_cat/shards"

或

curl -X GET "https://<es-endpoint>/_cluster/allocation/explain"

检查快照。

要查找快照存储库，请执行以下查询：

curl -X GET "https://<es-endpoint>/_snapshot?pretty"

接下来让我们看看cs-automated存储库中的所有快照：

curl -X GET "https://<es-endpoint>/_snapshot/cs-automated/_all?pretty"

查找failures: [ ]为空或要恢复的索引未处于失败状态的快照。 然后删除要恢复的索引：

curl -XDELETE 'https://<es-endpoint>/<index-name>'

...并像这样恢复已删除的索引：

curl -XPOST 'https://<es-endpoint>/_snapshot/cs-automated/<snapshot-name>/_restore' -d '{"indices": "<index-name>"}' -H 'Content-Type: application/json'

这里也有一些很好的文档：

Answer 3

我也遇到了类似的问题。 解决方案非常简单。 您可以通过 2 种不同的方式解决它。

第一个解决方案是集体编辑所有索引：

PUT _all/_settings
{
    "index.allocation.max_retries": 3
}

第二种解决方案是编辑特定索引：

PUT <myIndex>/_settings
{
    "index.allocation.max_retries": 3
}

AWS 上的 Elasticsearch：如何修复未分配的分片？

问题描述

3 个解决方案

解决方案1
17 2018-11-15 14:08:19

解决方案2
1 2020-08-26 12:43:33

检查失败的索引。

检查快照。

解决方案3
0 2022-08-22 20:06:42

AWS 上的 Elasticsearch：如何修复未分配的分片？

问题描述

3 个解决方案

解决方案1 17 2018-11-15 14:08:19

解决方案2 1 2020-08-26 12:43:33

检查失败的索引。

检查快照。

解决方案3 0 2022-08-22 20:06:42

解决方案1
17 2018-11-15 14:08:19

解决方案2
1 2020-08-26 12:43:33

解决方案3
0 2022-08-22 20:06:42