简体   繁体   中英

How to speed up Elasticsearch scroll in python

I need to get data for a certain period of time by es api and use python to do some customized analysis of these data and display the result on dashboard.

There are about two hundred thousand records every 15 minutes,indexed by date.

Now I use scroll-scan to get data,But it takes nearly a minute to get 200000 records,It seems to be too slow.

Is there any way to process these data more quickly?and can I use something like redis to save the results and avoid repetitive work?

Is it possible to do the analysis on the Elasticsearch side using aggregations?

Assuming you're not doing it already, you should use _source to only download the absolute minimum data required. You could also try increasing the size parameter to scan() from the default of 1000. I would expect only modest speed improvements from that, however.

If the historical data doesn't change, then a cache like Redis (or even just a local file) could be a good solution. If the historical data can change, then you'd have to manage cache invalidation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM