简体   繁体   中英

Elasticsearch Indexing performance tuning

I am loading data into a two node elasticsearch cluster, 5 shards each, using apache-flume (one flume agent) using ExecSource (cat command), file channel and a sink that is custom built using elasticsearch Bulk and XContentBuilder java apis.

The collection of events in flume happens at a rate of 8000 events/sec (size of 1 event 246 bytes) but during indexing in elasticsearch, it gets reduced to 3000 events/sec.

How can I tune indexing performance of elasticsearch to get my throughput close to the rate of collection in flume?

I have written a script which you can download here . Its a shell script however I'm sure you can translate irrespective of your platform. There are many variable in indexing performance. Hardware and system variables. There are quite a lot of resources out there.

I would consider looking at the way logstash writes to elasticsearch - specifically they set the indexing interval to 5s in order to speed up performance. You may also want to test whether compression helps or hinders.

Otherwise, I would increase your cluster size.

使用内存通道代替文件通道,它将提高几倍的输出速度。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM