Elasticsearch批量更新非常慢

Question

I am indexing a large amount of daily data ~160GB per index into elasticsearch. 我正在将每个索引约160GB的大量每日数据索引到elasticsearch中。 I am facing this case where I need to update almost all the docs in the indices with a small amount of data(~16GB) which is of the format 我正面临这种情况，我需要用少量格式的数据（〜16GB）来更新索引中的几乎所有文档

id1,data1
id1,data2
id2,data1
id2,data2
id2,data3
.
.
.

My update operations start happening at 16000 lines per second and in over 5 minutes it comes down to 1000 lines per second and doesnt go up after that. 我的更新操作开始以每秒16000行的速度发生，并且在5分钟以上的时间内下降到每秒1000行，此后没有上升。 The update process for this 16GB of data is currently longer than the time it takes for my entire indexing of 160GB to happen 目前，这16GB数据的更新过程比我整个160GB索引所需的时间更长

My conf file for the update operation currently looks as follows 我的用于更新操作的conf文件当前如下所示

output
{
    elasticsearch {
        action => "update"
        doc_as_upsert => true
        hosts => ["host1","host2","host3","host4"]
        index => "logstash-2017-08-1"
        document_id => "%{uniqueid}"
        document_type => "daily"
        retry_on_conflict => 2
        flush_size => 1000
    }

}

The optimizations I have done to speed up indexing in my cluster based on the suggestions here https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html are 根据https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html此处的建议，我为加快群集索引编制所做的优化是

Setting "indices.store.throttle.type" : "none" 设置“ indices.store.throttle.type”：“无”
Index "refresh_interval" : "-1" 索引“ refresh_interval”：“-1”

I am running my cluster on 4 instances of the d2.8xlarge EC2 instances. 我在d2.8xlarge EC2实例的4个实例上运行群集。 I have allocated 30GB of heap to each nodes. 我已经为每个节点分配了30GB的堆空间。 While the update is happening barely any cpu is used and the load is very less as well. 尽管更新几乎没有发生，但使用了任何CPU，而且负载也很少。

Despite everything the update is extremely slow. 尽管有所有更新，但更新还是非常缓慢。 Is there something very obvious that I am missing that is causing this issue? 我是否很明显地缺少导致此问题的信息？ While looking at the threadpool data I find that the number of threads working on bulk operations are constantly high. 在查看线程池数据时，我发现从事批量操作的线程数量一直很高。

Any help on this issue would be really helpful 在这个问题上的任何帮助都会非常有帮助

Thanks in advance 提前致谢

Answer 1

There are a couple of rule-outs to try here. 这里有一些排除规则可以尝试。

Memory Pressure 记忆压力

With 244GB of RAM, this is not terribly likely, but you can still check it out. 有了244GB的RAM，这不太可能，但是您仍然可以检查出来。 Find the jstat command in the JDK for your platform, though there are visual tools for some of them. 在JDK中找到适合您平台的jstat命令，尽管其中有一些可视化工具。 You want to check both your Logstash JVM and the ElasticSearch JVMs. 您想同时检查Logstash JVM和ElasticSearch JVM。

jstat -gcutil -h7 {PID of JVM} 2s

This will give you a readout of the various memory pools, garbage collection counts, and GC timings for that JVM as it works. 这将为您提供该JVM工作时的各种内存池，垃圾回收计数和GC计时的读数。 It will update every 2 seconds, and print headers every 7 lines. 它将每2秒更新一次，并每7行打印一次标题。 Spending excessive time in the FCT is a sign that you're underallocated for HEAP. 在FCT花费过多的时间表明您分配给HEAP的资源不足。

I/O Pressure 输入/输出压力

The d2.8xlarge is a dense-storage instance, and may not be great for a highly random, small-block workload. d2.8xlarge是一个密集存储实例，对于高度随机的小块工作负载而言可能并不理想。 If you're on a Unix platform, top will tell you how much time you're spending in IOWAIT state. 如果您使用的是Unix平台， top会告诉您在IOWAIT状态下花费了多少时间。 If it's high, your storage isn't up to the workload you're sending it. 如果存储空间过大，则说明存储空间无法满足您发送存储空间的需求。

If that's the case, you may want to consider provisioned IOP EBS instances rather than the instance-local stuff. 在这种情况下，您可能需要考虑配置IOP EBS实例，而不是本地实例。 Or, if your stuff will fit, consider an instance in the i3 family of high I/O instances instead. 或者，如果适合您的情况，请考虑使用i3系列高I / O实例中的实例。

Logstash version Logstash版本

You don't say what version of Logstash you're using. 您没有说您正在使用哪个版本的Logstash。 Being StackOverflow, you're likely to be using 5.2. 作为StackOverflow，您可能正在使用5.2。 If that's the case, this isn't a rule-out. 如果是这样，这不是排除方法。

But, if you're using something in the 2.x series, you may want to set the -w flag to 1 at first, and work your way up. 但是，如果您使用的是2.x系列中的内容，则可能首先需要将-w标志设置为1，然后逐步提高。 Yes, that's single-threading this. 是的，这是单线程的。 But the ElasticSearch output has some concurrency issues in the 2.x series that are largely fixed in the 5.x series. 但是，ElasticSearch输出在2.x系列中存在一些并发问题，这些问题在5.x系列中已基本解决。

Answer 2

With elasticsearch version 6.0 we had an exactly same issue of slow updates on aws and the culprit was slow I/O. 在Elasticsearch 6.0版中，我们对aws进行缓慢更新的问题完全相同，而罪魁祸首是I / O缓慢。 Same data was upserting on a local test stack completely fine but once in cloud on ec2 stack, everything was dying after an initial burst of speedy inserts lasting only for few minutes. 相同的数据可以在本地测试堆栈上完全插入，但是一旦在ec2堆栈上的云中进行，在快速插入的初始突发仅持续几分钟后，一切都将消失。

Local test stack was a low-spec server in terms of memory and cpu but contained SSDs. 就内存和CPU而言，本地测试堆栈是低规格服务器，但包含SSD。

s3 stack was EBS volumes with default gp2 300 IOPS. s3堆栈是具有默认gp2 300 IOPS的EBS卷。

Converting the volumes to type io1 with 3000 IOPS solved the issue and everything got back on track. 使用3000 IOPS将卷转换为io1类型，可以解决此问题，一切恢复正常。

Answer 3

I am using amazon aws elasticsearch service version 6.0 . 我正在使用Amazon AWS Elasticsearch Service 6.0版。 I need heavy write/insert from serials of json file to the elasticsearch for 10 billion items . 我需要从json文件序列到Elasticsearch进行大量写入/插入操作，以搜索100亿个项目。 The elasticsearch-py bulk write speed is really slow most of time and occasionally high speed write . 大部分时间，elasticsearch-py批量写入速度确实很慢，偶尔也可以高速写入。 i tried all kinds of methods , such as split json file to smaller pieces , multiprocess read json files , parallel_bulk insert into elasticsearch , nothing works . 我尝试了各种方法，例如将json文件拆分成较小的部分，多进程读取json文件，将parallel_bulk插入elasticsearch中，没有任何效果。 Finally , after I upgraded io1 EBS volume , everything goes smoothly with 10000 write IOPS . 最终，在升级io1 EBS卷之后，使用10,000个写入IOPS可以顺利进行所有操作。

Elasticsearch批量更新非常慢

问题描述

3 个解决方案

解决方案1
2 2017-03-12 03:55:47

解决方案2
0 2018-02-13 06:49:20

解决方案3
0 2018-07-06 05:05:26

Elasticsearch批量更新非常慢

问题描述

3 个解决方案

解决方案1 2 2017-03-12 03:55:47

解决方案2 0 2018-02-13 06:49:20

解决方案3 0 2018-07-06 05:05:26

解决方案1
2 2017-03-12 03:55:47

解决方案2
0 2018-02-13 06:49:20

解决方案3
0 2018-07-06 05:05:26