I am loading data into a two node elasticsearch cluster, 5 shards each, using apache-flume (one flume agent) using ExecSource (cat command), file channel and a sink that is custom built using elasticsearch Bulk and XContentBuilder java apis.
The collection of events in flume happens at a rate of 8000 events/sec (size of 1 event 246 bytes) but during indexing in elasticsearch, it gets reduced to 3000 events/sec.
How can I tune indexing performance of elasticsearch to get my throughput close to the rate of collection in flume?
I have written a script which you can download here . Its a shell script however I'm sure you can translate irrespective of your platform. There are many variable in indexing performance. Hardware and system variables. There are quite a lot of resources out there.
I would consider looking at the way logstash writes to elasticsearch - specifically they set the indexing interval to 5s in order to speed up performance. You may also want to test whether compression helps or hinders.
Otherwise, I would increase your cluster size.
使用内存通道代替文件通道,它将提高几倍的输出速度。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.