[英]What considerations should I take into account when increasing the size in the Scroll API in Elasticsearch?
I am currently toying around with the Scroll API of Elasticsearch, and want to use it to obtain a large set of data and do some manual processing on it. 我目前正在使用Elasticsearch的Scroll API ,并希望使用它来获取大量数据并对其进行一些手动处理。 The processing is performed by an external library and is not of the type that can easily be included as a script . 该处理由外部库执行,并且不属于可以轻易包含为脚本的类型 。
While this seems to work nicely at the moment, I was wondering what considerations that I should take into account when fine-tuning the scroll size for performing this form of processing. 尽管目前看来这很好,但是我想知道在微调滚动尺寸以执行这种形式的处理时应考虑哪些注意事项。 A quick observation seems to indicate that increasing the scroll size will reduce the latency of the operation. 快速观察似乎表明增加滚动大小将减少操作的等待时间。 While I suspect that larger scroll sizes will generally reduce throughput, I have no idea whether this hypothesis is correct. 尽管我怀疑较大的滚动条通常会降低吞吐量,但我不知道这种假设是否正确。 Also, I have no idea if there are any other consequences that I do not envision right now. 另外,我不知道是否有其他后果我现在没有想到。
So to summarize, my question is: what impact does changing Elasticsearch's scroll size have, especially on performance, in a scenario where the results are processed for each batch that is obtained? 因此,总而言之,我的问题是:在为获得的每个批次处理结果的情况下,更改Elasticsearch的滚动大小会产生什么影响,特别是对性能有何影响?
Thanks in advance! 提前致谢!
The one (and the only I know of) consideration is to be able to process batch fast enough to not release scroll context (which is controlled by ?scroll=X
parameter). 一个(也是我唯一知道的)考虑因素是能够足够快地处理批处理,而不会释放滚动上下文(由?scroll=X
参数控制)。
Assuming that you will consume all the data from query, there, scroll should be tuned based on network and 3rd-party app performance. 假设您将使用查询中的所有数据,则应根据网络和第三方应用程序的性能来调整滚动。 Ie 即
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.