简体   繁体   中英

Running out of memory writing to a file in NodeJS

I'm processing a very large amount of data that I'm manipulating and storing it in a file. I iterate over the dataset, then I want to store it all in a JSON file.

My initial method using fs, storing it all in an object then dumping it didn't work as I was running out of memory and it became extremely slow.

I'm now using fs.createWriteStream but as far as I can tell it's still storing it all in memory.

I want the data to be written object by object to the file, unless someone can recommend a better way of doing it.

Part of my code:

  // Top of the file
  var wstream = fs.createWriteStream('mydata.json');
  ...

  // In a loop
  let JSONtoWrite = {}
  JSONtoWrite[entry.word] = wordData

  wstream.write(JSON.stringify(JSONtoWrite))

  ...
  // Outside my loop (when memory is probably maxed out)
  wstream.end()

I think I'm using Streams wrong, can someone tell me how to write all this data to a file without running out of memory? Every example I find online relates to reading a stream in but because of the calculations I'm doing on the data, I can't use a readable stream. I need to add to this file sequentially.

The problem is that you're not waiting for the data to be flushed to the filesystem, but instead keep throwing new and new data to the stream synchronously in a tight loop.

Here's an piece of pseudocode that should work for you:

    // Top of the file
    const wstream = fs.createWriteStream('mydata.json');
    // I'm no sure how're you getting the data, let's say you have it all in an object
    const entry = {};
    const words = Object.keys(entry);

    function writeCB(index) {
       if (index >= words.length) {
           wstream.end()
           return;
       }

       const JSONtoWrite = {};
       JSONtoWrite[words[index]] = entry[words[index]];   
       wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(index + 1));
    }

    wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(0));

You should wrap your data source in a readable stream too. I don't know what is your source, but you have to make sure, it does not load all your data in memory.

For example, assuming your data set come from another file where JSON objects are splitted with end of line character, you could create a Read stream as follow:

const Readable = require('stream').Readable;
class JSONReader extends Readable {
constructor(options={}){
  super(options);
  this._source=options.source: // the source stream
  this._buffer='';
  source.on('readable', function() {
    this.read();
  }.bind(this));//read whenever the source is ready
}
_read(size){
   var chunk;
   var line;
   var lineIndex;
   var result;
   if (this._buffer.length === 0) {
     chunk = this._source.read(); // read more from source when buffer is empty
     this._buffer += chunk;
   }
   lineIndex = this._buffer.indexOf('\n'); // find end of line 
   if (lineIndex !== -1) { //we have a end of line and therefore a new object
      line = this._buffer.slice(0, lineIndex); // get the character related to the object
      if (line) {
        result = JSON.parse(line);
        this._buffer = this._buffer.slice(lineIndex + 1);
        this.push(JSON.stringify(line) // push to the internal read queue
      } else {
        this._buffer.slice(1)
      }
  }
}}

now you can use

const source = fs.createReadStream('mySourceFile');
const reader = new JSONReader({source});
const target = fs.createWriteStream('myTargetFile');
reader.pipe(target);

then you'll have a better memory flow:

同步与流内存管理

Please note that the picture and the above example are taken from the excellent nodejs in practice book

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM