简体繁体 English

ElasticSearch 滚动 API 连接时间

[英]ElasticSearch Scroll API Connection time

原文 2020-04-07 23:18:47 9 1 elasticsearch

We are using Elasticsearch 6.8 version.我们使用的是 Elasticsearch 6.8 版本。 I just want to use Scroll API ( https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html ) with scroll=1m connection time.我只想使用滚动 API （ https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.ZFC35FDC70D5FC69D239883A822EZ7连接时间）。 (1m is an example, what I am asking is the maximum value for the x minute or hour..) （1m 是一个例子，我要问的是 x 分钟或小时的最大值..）

What I am wondering is this scroll connection time.我想知道的是这个滚动连接时间。 If I request with the scrollId the connection time resets but what is the maximum time of it or is it bad to keep connection very long?如果我使用 scrollId 请求连接时间会重置，但它的最大时间是多少，或者保持连接很长时间是不好的？

I want to use scrollId with 1-10 million of records and export my documents as batches each 1 minute.我想使用具有 1-1000 万条记录的 scrollId 并每 1 分钟将我的文档作为批次导出。 Anyway, if my system is down somehow, I want to continue where I stopped, so I want to use my conection as long as possible if it does not use extra extra memory or cpu etc.. What is the maximum time that I can keep connection alive and what it should be?无论如何，如果我的系统以某种方式关闭，我想继续我停止的地方，所以我想尽可能长时间地使用我的连接，如果它不使用额外的 memory 或 cpu 等。我可以保持的最长时间是多少连接活着，它应该是什么？ Or should it be?或者应该是？

Thanks !谢谢！

1 个解决方案

Max value to keep scroll context alive is 24h(24 hours).保持滚动上下文活动的最大值是 24 小时（24 小时）。 This limit can be changed by setting the "search.max_keep_alive" cluster setting.可以通过设置“search.max_keep_alive”集群设置来更改此限制。

Setting large value can increase the load of the shards.设置较大的值会增加分片的负载。

From documentation从文档

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, eg in order to reindex the contents of one index into a new index with a different configuration滚动不是为了实时用户请求，而是为了处理大量数据，例如为了将一个索引的内容重新索引到具有不同配置的新索引中

From documentation从文档

Normally, the background merge process optimizes the index by merging together smaller segments to create new bigger segments, at which time the smaller segments are deleted.通常，后台合并过程通过将较小的段合并在一起以创建新的更大的段来优化索引，此时较小的段被删除。 This process continues during scrolling, but an open search context prevents the old segments from being deleted while they are still in use.此过程在滚动期间继续，但打开的搜索上下文可防止旧段在仍在使用时被删除。 This is how Elasticsearch is able to return the results of the initial search request, regardless of subsequent changes to documents.这就是 Elasticsearch 能够返回初始搜索请求结果的方式，而不管后续对文档的更改。

From documentation从文档

Search context are automatically removed when the scroll timeout has been exceeded.超过滚动超时时，搜索上下文会自动删除。 However keeping scrolls open has a cost, as discussed in the previous section so scrolls should be explicitly cleared as soon as the scroll is not being used anymore using the clear-scroll API:然而，保持滚动打开是有代价的，正如上一节中讨论的那样，一旦不再使用滚动，就应该使用 clear-scroll API 明确清除滚动：