如何将 600MB 大 json 文件批量插入到 elasticsearch？

Question

I am trying to insert 600MB Json file (which may be enlarge in the future) to elasticsearch.我正在尝试将 600MB Json 文件（将来可能会放大）插入到 elasticsearch。 However, I get below error,但是，我得到以下错误，

Error: "toString()" failed

I am using stream-json npm but no luck :( What is the best way to do this? I am thinking to chunk out, but if there's a better way, that'll be great我正在使用 stream-json npm 但没有运气:( 最好的方法是什么？我正在考虑分块，但如果有更好的方法，那就太好了

var makeBulk = function(csList, callback){
  const pipeline = fs.createReadStream('./CombinedServices_IBC.json').pipe(StreamValues.withParser());
  while()
  pipeline.on('data', data => {
    for(var index in data.value.features){
      bulk.push(
        { index: {_index: 'combinedservices1', _type: '_doc', _id: data.value.features[index].properties.OBJECTID } },
        {
          'geometry': data.value.features[index].geometry,
          'properties': data.value.features[index].properties
        }
      );
    }
    callback(bulk);

  });

}

Answer 1

There is a tool for such use case Elasticdump( https://github.com/taskrabbit/elasticsearch-dump )这种用例有一个工具 Elasticdump( https://github.com/taskrabbit/elasticsearch-dump )

Installation of elasticsearch-dump安装elasticsearch-dump

npm install elasticdump -g
elasticdump

Import Json into ES将 Json 导入 ES

elasticdump \
  --input=./CombinedServices_IBC.json \
  --output=http://127.0.0.1:9200/my_index \
  --type=alias

Answer 2

Don't insert a bulk of 600MB, default bulk queue can keep up to 200 bulks inn JVM Heap Space - imagine if each is 600MB, what you will get is OOM and GC problems不要插入 600MB 的块，默认的块队列最多可以在 JVM 堆空间中保留 200 个块 - 想象一下如果每个是 600MB，你会得到的是 OOM 和 GC 问题

Refer to https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html#_how_big_is_too_big ;参考https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html#_how_big_is_too_big ； example logstash elasticsearch output plugin sends bulk of up to 20Mb示例 logstash elasticsearch 输出插件发送高达20Mb 的批量

如何将 600MB 大 json 文件批量插入到 elasticsearch？

问题描述

2 个解决方案

解决方案1
2 2018-08-03 04:12:24

解决方案2
0 2018-08-03 07:24:53

如何将 600MB 大 json 文件批量插入到 elasticsearch？

问题描述

2 个解决方案

解决方案1 2 2018-08-03 04:12:24

解决方案2 0 2018-08-03 07:24:53

解决方案1
2 2018-08-03 04:12:24

解决方案2
0 2018-08-03 07:24:53