简体   繁体   English

如何避免Elasticsearch Bulk API中每个文档的索引

[英]How to avoid index for every document in elasticsearch bulk API

I'm using curl to add apache logrows as documents to elasticsearch using the bulk API. 我正在使用curl将apache日志作为文档添加到使用批量API的elasticsearch中。 I post the following: 我发布以下内容:

{"index": {"_type": "apache", "_id": "123", "_index": "apache-2017-01"}}
{"s": 200, "d": "example.se", "@t": "2017-01-01T00:00:00.000Z", "p": "/foo"}
{"index": {"_type": "apache", "_id": "124", "_index": "apache-2017-01"}}
{"s": 200, "d": "example.se", "@t": "2017-01-01T00:00:00.000Z", "p": "/bar"}
... more of the same ...

My guess is that for every logrow document row the lucene index updates it's index. 我的猜测是,对于每个logrow文档行,lucene索引都会更新其索引。 But I do not need elasticsearch to do that. 但是我不需要用elasticsearch来做。 I am perfectly fine with adding all logrow documents first, and after that update the index. 首先添加所有 logrow文档,然后再更新索引,就可以了。

Is this possible? 这可能吗? Is it a good idé? 这是一个好主意吗? Will it pollibly improve performance? 它会改善性能吗?

Your intuition is not far from truth. 你的直觉离真理不远。 By default ElasticSearch will update its index every second : 默认情况下,ElasticSearch 每秒更新其索引

The default index.refresh_interval is 1s, which forces Elasticsearch to create a new segment every second. 默认index.refresh_interval为1s,这会强制Elasticsearch每秒创建一个新段。 Increasing this value (to say, 30s) will allow larger segments to flush and decreases future merge pressure. 增大此值(例如30s)将允许较大的段冲洗并降低将来的合并压力。

So one of the ways to increase indexing throughput is increasing this index.refresh_interval , possibly even to infinity and then turn it back on once you have finished your inserts. 因此,增加索引吞吐量的一种方法是增加index.refresh_interval ,甚至可能增加到无穷大,然后在完成插入操作后将其重新打开。 (Note that inserted documents will be available for searching only after segment was closed, ie writing to it has finished.) (请注意,插入的文档仅在段关闭(即写入完成)之后才可用于搜索。)

This, however, is not the only possible bottleneck when inserting documents into ElasticSearch. 但是,这不是将文档插入ElasticSearch时唯一可能的瓶颈。 For example, you might consider using several threads for inserting documents in bulk, or other tweaks that are described in in Tune for index speed section of ElasticSearch documentation. 例如,您可能考虑使用多个线程来批量插入文档,或者使用ElasticSearch文档的“针对索引速度进行调整”部分中描述的其他调整。 You can look up other indexing parameters you may want to change in Dynamic Index Settings section. 您可以在“ 动态索引设置”部分中查找要更改的其他索引参数。

Hope that helps! 希望有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM