在就地执行Elasticsearch Reindex操作时发生服务器错误

Question

I am using, AWS Elasticsearch service(version 6.3). 我正在使用AWS Elasticsearch Service（版本6.3）。 I am interested in changing mapping while re-indexing data from current_index to new_index . 我对在将数据从current_index重新索引到new_index时更改映射感兴趣。 I am not trying to upgrade from older Elasticsearch clusters to new one. 我不是要从旧的Elasticsearch集群升级到新集群。 Both my current_index and new_index are on the same Elasticsearch 6.3 cluster. 我的current_index和new_index都在同一个Elasticsearch 6.3集群上。
I am trying to perform Reindex in place operation by following the information from Elastic documentation 我正在尝试通过遵循Elastic文档中的信息来执行就地重建索引
My index contains about 250k searchable documents. 我的索引包含约25万个可搜索文档。 When I POST _reindex request using curl, 当我使用curl发布POST _reindex请求时，

curl -X POST "aws_elasticsearch_endpoint/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "current_index"
  },
  "dest": {
    "index": "new_index"
  }
}
'

Elasticsearch starts the reindex process(I verify this by performing GET /_cat/indices?v ), and I end up getting curl: (56) Unexpected EOF error. Elasticsearch启动了重新索引过程（我通过执行GET /_cat/indices?v验证了这一点），最终导致curl: (56) Unexpected EOF错误。 The Reindex operation actually works fine. Reindex操作实际上可以正常工作。 After about 2 hours the doc.count in new_index matches that of current_index and status turns green 大约2小时后， doc.count中的new_index与current_index相匹配，并且status变为green

If I POST _reindex from Java, I get this error: 如果我从Java发布POST _reindex ， POST _reindex此错误：

java.net.SocketException: Unexpected end of file from server

Only when the document size in my index is small(I tried with like 1k searchable documents) is when the Reindex API returns success-fully as specified here 只有当索引中的文档大小较小（我尝试使用类似1k可搜索的文档）时，Reindex API才会成功返回此处指定的位置

Answer 1

AWS Elasticsearch ELB(Elastic Load Balancer) has a timeout of 60 seconds. AWS Elasticsearch ELB（弹性负载平衡器）的超时时间为60秒。 This is not configurable at the moment and has been a long standing feature request 此功能目前无法配置，是一项长期存在的功能要求
You can find more details in this aws forum thread 您可以在此aws论坛主题中找到更多详细信息

As a result any operation and in this particular case a reindex taking more than 60 seconds would result in a gateway timeout. 结果，任何操作以及在此特定情况下重新索引花费的时间超过60秒将导致网关超时。
As a result it is not possible to block on a long running reindex by increasing client timeout. 结果，不可能通过增加客户端超时来阻止长时间运行的重新索引。

For the reindex api the workaround is as suggested by @Val above. 对于reindex API，解决方法如上述@Val所建议。 That is to use the wait_for_completion=false flag and the steps as mentioned in the Reindex API documentation link : https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#_url_parameters_3 那就是使用wait_for_completion=false标志和Reindex API文档链接中提到的步骤： https : //www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#_url_parameters_3

Answer 2

This is because the response takes a long time to return and curl times out. 这是因为响应需要很长时间才能返回，并且卷曲超时。 On small data sets, the response comes back before the time out, hence why you're getting a response. 在小型数据集上，响应会在超时之前返回，因此您为什么要获得响应。

When curl times out, the reindex is still in progress, though, and you can still see how the reindex is doing using this command: 但是，当curl超时时，重新索引仍在进行中，您仍然可以使用以下命令查看重新索引的执行情况：

GET _tasks?actions=*reindex&detailed=true

What you can also do is to add ...?wait_for_completion=false to your curl command. 您还可以做的是在curl命令中添加...?wait_for_completion=false 。 ES will create a background task for your reindex operation. ES将为您的重新索引操作创建一个后台任务。 The curl command will terminate early and return a taskId that you can then use to regularly check the state of the reindex using the Task API curl命令将提前终止并返回taskId ，您可以使用taskId使用Task API定期检查重新索引的状态

GET .tasks/task/<taskId>

Also note that in this case, when the task is done, you'll also need to remove the task from the .tasks index, ES will not do it for you. 还要注意，在这种情况下，完成任务后，您还需要从.tasks索引中删除任务，ES不会为您完成。

在就地执行Elasticsearch Reindex操作时发生服务器错误

问题描述

2 个解决方案

解决方案1
2 2019-03-02 06:15:50

解决方案2
1 已采纳 2019-03-02 05:37:49

在就地执行Elasticsearch Reindex操作时发生服务器错误

问题描述

2 个解决方案

解决方案1 2 2019-03-02 06:15:50

解决方案2 1 已采纳 2019-03-02 05:37:49

解决方案1
2 2019-03-02 06:15:50

解决方案2
1 已采纳 2019-03-02 05:37:49