[英]Server error while performing Elasticsearch Reindex in place operation
I am using, AWS Elasticsearch service(version 6.3). 我正在使用AWS Elasticsearch Service(版本6.3)。 I am interested in changing mapping while re-indexing data from current_index
to new_index
. 我对在将数据从current_index
重新索引到new_index
时更改映射感兴趣。 I am not trying to upgrade from older Elasticsearch clusters to new one. 我不是要从旧的Elasticsearch集群升级到新集群。 Both my current_index
and new_index
are on the same Elasticsearch 6.3 cluster. 我的current_index
和new_index
都在同一个Elasticsearch 6.3集群上。
I am trying to perform Reindex in place operation by following the information from Elastic documentation 我正在尝试通过遵循Elastic文档中的信息来执行就地重建索引
My index contains about 250k searchable documents. 我的索引包含约25万个可搜索文档。 When I POST _reindex
request using curl, 当我使用curl发布POST _reindex
请求时,
curl -X POST "aws_elasticsearch_endpoint/_reindex" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "current_index"
},
"dest": {
"index": "new_index"
}
}
'
Elasticsearch starts the reindex process(I verify this by performing GET /_cat/indices?v
), and I end up getting curl: (56) Unexpected EOF
error. Elasticsearch启动了重新索引过程(我通过执行GET /_cat/indices?v
验证了这一点),最终导致curl: (56) Unexpected EOF
错误。 The Reindex operation actually works fine. Reindex操作实际上可以正常工作。 After about 2 hours the doc.count
in new_index
matches that of current_index
and status
turns green
大约2小时后, doc.count
中的new_index
与current_index
相匹配,并且status
变为green
If I POST _reindex
from Java, I get this error: 如果我从Java发布POST _reindex
, POST _reindex
此错误:
java.net.SocketException: Unexpected end of file from server
Only when the document size in my index is small(I tried with like 1k searchable documents) is when the Reindex API returns success-fully as specified here 只有当索引中的文档大小较小(我尝试使用类似1k可搜索的文档)时,Reindex API才会成功返回此处指定的位置
AWS Elasticsearch ELB(Elastic Load Balancer) has a timeout of 60 seconds. AWS Elasticsearch ELB(弹性负载平衡器)的超时时间为60秒。 This is not configurable at the moment and has been a long standing feature request 此功能目前无法配置,是一项长期存在的功能要求
You can find more details in this aws forum thread 您可以在此aws论坛主题中找到更多详细信息
As a result any operation and in this particular case a reindex taking more than 60 seconds would result in a gateway timeout. 结果,任何操作以及在此特定情况下重新索引花费的时间超过60秒将导致网关超时。
As a result it is not possible to block on a long running reindex by increasing client timeout. 结果,不可能通过增加客户端超时来阻止长时间运行的重新索引。
For the reindex api the workaround is as suggested by @Val above. 对于reindex API,解决方法如上述@Val所建议。 That is to use the wait_for_completion=false
flag and the steps as mentioned in the Reindex API documentation link : https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#_url_parameters_3 那就是使用wait_for_completion=false
标志和Reindex API文档链接中提到的步骤: https : //www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#_url_parameters_3
This is because the response takes a long time to return and curl times out. 这是因为响应需要很长时间才能返回,并且卷曲超时。 On small data sets, the response comes back before the time out, hence why you're getting a response. 在小型数据集上,响应会在超时之前返回,因此您为什么要获得响应。
When curl times out, the reindex is still in progress, though, and you can still see how the reindex is doing using this command: 但是,当curl超时时,重新索引仍在进行中,您仍然可以使用以下命令查看重新索引的执行情况:
GET _tasks?actions=*reindex&detailed=true
What you can also do is to add ...?wait_for_completion=false
to your curl command. 您还可以做的是在curl命令中添加...?wait_for_completion=false
。 ES will create a background task for your reindex operation. ES将为您的重新索引操作创建一个后台任务。 The curl command will terminate early and return a taskId
that you can then use to regularly check the state of the reindex using the Task API curl命令将提前终止并返回taskId
,您可以使用taskId
使用Task API定期检查重新索引的状态
GET .tasks/task/<taskId>
Also note that in this case, when the task is done, you'll also need to remove the task from the .tasks
index, ES will not do it for you. 还要注意,在这种情况下,完成任务后,您还需要从.tasks
索引中删除任务,ES不会为您完成。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.