简体   繁体   中英

What considerations should I take into account when increasing the size in the Scroll API in Elasticsearch?

I am currently toying around with the Scroll API of Elasticsearch, and want to use it to obtain a large set of data and do some manual processing on it. The processing is performed by an external library and is not of the type that can easily be included as a script .

While this seems to work nicely at the moment, I was wondering what considerations that I should take into account when fine-tuning the scroll size for performing this form of processing. A quick observation seems to indicate that increasing the scroll size will reduce the latency of the operation. While I suspect that larger scroll sizes will generally reduce throughput, I have no idea whether this hypothesis is correct. Also, I have no idea if there are any other consequences that I do not envision right now.

So to summarize, my question is: what impact does changing Elasticsearch's scroll size have, especially on performance, in a scenario where the results are processed for each batch that is obtained?

Thanks in advance!

The one (and the only I know of) consideration is to be able to process batch fast enough to not release scroll context (which is controlled by ?scroll=X parameter).

Assuming that you will consume all the data from query, there, scroll should be tuned based on network and 3rd-party app performance. Ie

  • if your app can process data in stream-like manner, bigger chunks is better
  • if your app processing data in batches (waiting for full ES response first), the upper limit for batch size should guarantee processing time < scroll release time
  • if you work in poor network environment, less batch size is better to handle overhead of dropped connections/retries
  • generally, bigger batch is obviously better, as it eliminates some network/ES cpu overhead

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM