如何追踪Elasticsearch批量导入失败的原因？

Question

I'm currently trying to import over 600,000 documents to my elasticsearch server. 我目前正在尝试将超过600,000个文档导入我的Elasticsearch服务器。

I can import 10,000 products using the javascript client with no issues, but with all of them, I run into this issue. 我可以使用javascript客户端毫无问题地导入10,000种产品，但是所有这些问题我都遇到了。

ELASTIC_HOST="hostname:9200" node import.js --trace_debug_json=true
buffer.js:382
    throw new Error('toString failed');
    ^

Error: toString failed
    at Buffer.toString (buffer.js:382:11)
    at Object.fs.readFileSync (fs.js:461:33)
    at Object.Module._extensions..js (module.js:441:20)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:311:12)
    at Function.Module.runMain (module.js:467:10)
    at startup (node.js:134:18)
    at node.js:961:3

The import.js is composed like this. import.js是这样组成的。 Truncated, because it's a total of 1281687 lines. 被截断，因为总共有1281687行。

if (!process.env.ELASTIC_HOST) throw new Error('set ELASTIC_HOST (example: "127.0.0.1:9200")');
var elasticsearch = require('elasticsearch');
var client = new elasticsearch.Client({host: process.env.ELASTIC_HOST ,log:'trace'});
client.bulk({body: [
  { index: { _index: 'products', _type: 'product', _id: 12800223350 } },
  { slug: '12800223350', mfrCatNum: "945R4", name: "Heavy Duty Battery", fulltechDesc: "1 Cell; 6 V; Connection Type Screw Terminal; Used For Lantern; Heavy Duty", invoiceDescription: "6V HD Lantern Battery" , twokDesc: "1 Cell; 6 V; Connection Type Screw Terminal; Used For Lantern; Heavy Duty" },

  /* more documents here */

  { index: { _index: 'products', _type: 'product', _id: 754473940287 } },
  { slug: '754473940287', mfrCatNum: "B30-R10000-KB-16", name: "Heavy-Duty Print Ribbon", fulltechDesc: "Print Ribn", mfrDescription: "B30 Series Heavy-Duty Print Ribbon - Black/Blue", invoiceDescription: "Print Ribn" },
]}, function(err, resp) {
  console.log(err);
});

How can I trace the source of the error, so I can upload all my documents so I can actually evaluate elasticsearch for my current needs? 如何跟踪错误的根源，以便可以上载所有文档，从而可以实际评估当前需求的elasticsearch？

Answer 1

You're hitting an error telling you that you're trying to store too much data into a buffer (indirectly via your huge bulk call, of course, because the JS client will concatenate the bulk array into a huge string buffer). 您遇到一个错误，告诉您您正在尝试将太多数据存储到缓冲区中（当然，这是因为您的JS客户端会将批量数组连接到一个巨大的字符串缓冲区中），这是间接通过巨大的批量调用进行的。 If memory serves, the maximum buffer size is 256MB, so if you have 600K documents you're probably over that limit. 如果有内存，则最大缓冲区大小为256MB，因此，如果您有600K文档，则可能超出该限制。

I would suggest to split your call into several calls... glancing at your data, you might be able to do this in two calls, maybe three. 我建议将您的通话分为多个通话...浏览数据，您也许可以在两个通话中执行此操作，也许是三个。 Give it a shot and let us know how it goes. 试一试，让我们知道如何进行。

如何追踪Elasticsearch批量导入失败的原因？

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-10-22 04:01:23

如何追踪Elasticsearch批量导入失败的原因？

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-10-22 04:01:23

解决方案1
2 已采纳 2015-10-22 04:01:23