简体   繁体   English

如何追踪Elasticsearch批量导入失败的原因?

[英]How to trace cause of failing elasticsearch bulk import?

I'm currently trying to import over 600,000 documents to my elasticsearch server. 我目前正在尝试将超过600,000个文档导入我的Elasticsearch服务器。

I can import 10,000 products using the javascript client with no issues, but with all of them, I run into this issue. 我可以使用javascript客户端毫无问题地导入10,000种产品,但是所有这些问题我都遇到了。

ELASTIC_HOST="hostname:9200" node import.js --trace_debug_json=true
buffer.js:382
    throw new Error('toString failed');
    ^

Error: toString failed
    at Buffer.toString (buffer.js:382:11)
    at Object.fs.readFileSync (fs.js:461:33)
    at Object.Module._extensions..js (module.js:441:20)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:311:12)
    at Function.Module.runMain (module.js:467:10)
    at startup (node.js:134:18)
    at node.js:961:3

The import.js is composed like this. import.js是这样组成的。 Truncated, because it's a total of 1281687 lines. 被截断,因为总共有1281687行。

if (!process.env.ELASTIC_HOST) throw new Error('set ELASTIC_HOST (example: "127.0.0.1:9200")');
var elasticsearch = require('elasticsearch');
var client = new elasticsearch.Client({host: process.env.ELASTIC_HOST ,log:'trace'});
client.bulk({body: [
  { index: { _index: 'products', _type: 'product', _id: 12800223350 } },
  { slug: '12800223350', mfrCatNum: "945R4", name: "Heavy Duty Battery", fulltechDesc: "1 Cell; 6 V; Connection Type Screw Terminal; Used For Lantern; Heavy Duty", invoiceDescription: "6V HD Lantern Battery" , twokDesc: "1 Cell; 6 V; Connection Type Screw Terminal; Used For Lantern; Heavy Duty" },

  /* more documents here */

  { index: { _index: 'products', _type: 'product', _id: 754473940287 } },
  { slug: '754473940287', mfrCatNum: "B30-R10000-KB-16", name: "Heavy-Duty Print Ribbon", fulltechDesc: "Print Ribn", mfrDescription: "B30 Series Heavy-Duty Print Ribbon - Black/Blue", invoiceDescription: "Print Ribn" },
]}, function(err, resp) {
  console.log(err);
});

How can I trace the source of the error, so I can upload all my documents so I can actually evaluate elasticsearch for my current needs? 如何跟踪错误的根源,以便可以上载所有文档,从而可以实际评估当前需求的elasticsearch?

You're hitting an error telling you that you're trying to store too much data into a buffer (indirectly via your huge bulk call, of course, because the JS client will concatenate the bulk array into a huge string buffer). 您遇到一个错误,告诉您您正在尝试将太多数据存储到缓冲区中(当然,这是因为您的JS客户端会将批量数组连接到一个巨大的字符串缓冲区中),这是间接通过巨大的批量调用进行的。 If memory serves, the maximum buffer size is 256MB, so if you have 600K documents you're probably over that limit. 如果有内存,则最大缓冲区大小为256MB,因此,如果您有600K文档,则可能超出该限制。

I would suggest to split your call into several calls... glancing at your data, you might be able to do this in two calls, maybe three. 我建议将您的通话分为多个通话...浏览数据,您也许可以在两个通话中执行此操作,也许是三个。 Give it a shot and let us know how it goes. 试一试,让我们知道如何进行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM