简体繁体中英

What considerations should I take into account when increasing the size in the Scroll API in Elasticsearch?

原文 2017-06-07 10:08:18 5 1 elasticsearch

I am currently toying around with the Scroll API of Elasticsearch, and want to use it to obtain a large set of data and do some manual processing on it. The processing is performed by an external library and is not of the type that can easily be included as a script .

While this seems to work nicely at the moment, I was wondering what considerations that I should take into account when fine-tuning the scroll size for performing this form of processing. A quick observation seems to indicate that increasing the scroll size will reduce the latency of the operation. While I suspect that larger scroll sizes will generally reduce throughput, I have no idea whether this hypothesis is correct. Also, I have no idea if there are any other consequences that I do not envision right now.

So to summarize, my question is: what impact does changing Elasticsearch's scroll size have, especially on performance, in a scenario where the results are processed for each batch that is obtained?

Thanks in advance!

1 answers

The one (and the only I know of) consideration is to be able to process batch fast enough to not release scroll context (which is controlled by ?scroll=X parameter).

Assuming that you will consume all the data from query, there, scroll should be tuned based on network and 3rd-party app performance. Ie

if your app can process data in stream-like manner, bigger chunks is better
if your app processing data in batches (waiting for full ES response first), the upper limit for batch size should guarantee processing time < scroll release time
if you work in poor network environment, less batch size is better to handle overhead of dropped connections/retries
generally, bigger batch is obviously better, as it eliminates some network/ES cpu overhead

Increasing the size of the queue in Elasticsearch?

Elasticsearch increasing heap size

I18N considerations in ElasticSearch

ElasticSearch Window Scroll Size

ElasticSearch Scroll api issue

When should I move to multinode elasticsearch cluster?

When using ElasticSearch Scroll API, how to optimize the time parameter in situ?

Elasticsearch: empty slices when using scroll api with slice

Elasticsearch take into account likes when doing significant_terms aggregation with Function score

What should I do if I need special analyzer in ElasticSearch

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Increasing the size of the queue in Elasticsearch? Elasticsearch increasing heap size I18N considerations in ElasticSearch ElasticSearch Window Scroll Size ElasticSearch Scroll api issue When should I move to multinode elasticsearch cluster? When using ElasticSearch Scroll API, how to optimize the time parameter in situ? Elasticsearch: empty slices when using scroll api with slice Elasticsearch take into account likes when doing significant_terms aggregation with Function score What should I do if I need special analyzer in ElasticSearch

Related Tags

What considerations should I take into account when increasing the size in the Scroll API in Elasticsearch?

Question

1 answers

solution1 0 ACCPTED 2017-06-07 12:11:09

solution1
0 ACCPTED 2017-06-07 12:11:09