简体   繁体   中英

How to trace cause of failing elasticsearch bulk import?

I'm currently trying to import over 600,000 documents to my elasticsearch server.

I can import 10,000 products using the javascript client with no issues, but with all of them, I run into this issue.

ELASTIC_HOST="hostname:9200" node import.js --trace_debug_json=true
buffer.js:382
    throw new Error('toString failed');
    ^

Error: toString failed
    at Buffer.toString (buffer.js:382:11)
    at Object.fs.readFileSync (fs.js:461:33)
    at Object.Module._extensions..js (module.js:441:20)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:311:12)
    at Function.Module.runMain (module.js:467:10)
    at startup (node.js:134:18)
    at node.js:961:3

The import.js is composed like this. Truncated, because it's a total of 1281687 lines.

if (!process.env.ELASTIC_HOST) throw new Error('set ELASTIC_HOST (example: "127.0.0.1:9200")');
var elasticsearch = require('elasticsearch');
var client = new elasticsearch.Client({host: process.env.ELASTIC_HOST ,log:'trace'});
client.bulk({body: [
  { index: { _index: 'products', _type: 'product', _id: 12800223350 } },
  { slug: '12800223350', mfrCatNum: "945R4", name: "Heavy Duty Battery", fulltechDesc: "1 Cell; 6 V; Connection Type Screw Terminal; Used For Lantern; Heavy Duty", invoiceDescription: "6V HD Lantern Battery" , twokDesc: "1 Cell; 6 V; Connection Type Screw Terminal; Used For Lantern; Heavy Duty" },

  /* more documents here */

  { index: { _index: 'products', _type: 'product', _id: 754473940287 } },
  { slug: '754473940287', mfrCatNum: "B30-R10000-KB-16", name: "Heavy-Duty Print Ribbon", fulltechDesc: "Print Ribn", mfrDescription: "B30 Series Heavy-Duty Print Ribbon - Black/Blue", invoiceDescription: "Print Ribn" },
]}, function(err, resp) {
  console.log(err);
});

How can I trace the source of the error, so I can upload all my documents so I can actually evaluate elasticsearch for my current needs?

You're hitting an error telling you that you're trying to store too much data into a buffer (indirectly via your huge bulk call, of course, because the JS client will concatenate the bulk array into a huge string buffer). If memory serves, the maximum buffer size is 256MB, so if you have 600K documents you're probably over that limit.

I would suggest to split your call into several calls... glancing at your data, you might be able to do this in two calls, maybe three. Give it a shot and let us know how it goes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM