简体   繁体   English

内存不足写入NodeJS中的文件

[英]Running out of memory writing to a file in NodeJS

I'm processing a very large amount of data that I'm manipulating and storing it in a file. 我正在处理大量我正在操作并将其存储在文件中的数据。 I iterate over the dataset, then I want to store it all in a JSON file. 我遍历数据集,然后我想将它全部存储在JSON文件中。

My initial method using fs, storing it all in an object then dumping it didn't work as I was running out of memory and it became extremely slow. 我的初始方法使用fs,将它全部存储在一个对象然后转储它不起作用,因为我的内存耗尽,它变得非常慢。

I'm now using fs.createWriteStream but as far as I can tell it's still storing it all in memory. 我现在正在使用fs.createWriteStream,但据我所知,它仍然将它全部存储在内存中。

I want the data to be written object by object to the file, unless someone can recommend a better way of doing it. 我希望将数据逐个对象地写入文件,除非有人可以推荐更好的方法。

Part of my code: 我的部分代码:

  // Top of the file
  var wstream = fs.createWriteStream('mydata.json');
  ...

  // In a loop
  let JSONtoWrite = {}
  JSONtoWrite[entry.word] = wordData

  wstream.write(JSON.stringify(JSONtoWrite))

  ...
  // Outside my loop (when memory is probably maxed out)
  wstream.end()

I think I'm using Streams wrong, can someone tell me how to write all this data to a file without running out of memory? 我想我正在使用Streams错误,有人可以告诉我如何将所有这些数据写入文件而不会耗尽内存吗? Every example I find online relates to reading a stream in but because of the calculations I'm doing on the data, I can't use a readable stream. 我在网上找到的每个例子都与读取流有关,但由于我正在对数据进行计算,我不能使用可读流。 I need to add to this file sequentially. 我需要按顺序添加到此文件。

The problem is that you're not waiting for the data to be flushed to the filesystem, but instead keep throwing new and new data to the stream synchronously in a tight loop. 问题是你不是在等待将数据刷新到文件系统,而是在紧密循环中同步地将新数据和新数据同时发送到流中。

Here's an piece of pseudocode that should work for you: 这是一个适合你的伪代码:

    // Top of the file
    const wstream = fs.createWriteStream('mydata.json');
    // I'm no sure how're you getting the data, let's say you have it all in an object
    const entry = {};
    const words = Object.keys(entry);

    function writeCB(index) {
       if (index >= words.length) {
           wstream.end()
           return;
       }

       const JSONtoWrite = {};
       JSONtoWrite[words[index]] = entry[words[index]];   
       wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(index + 1));
    }

    wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(0));

You should wrap your data source in a readable stream too. 您也应该将数据源包装在可读的流中。 I don't know what is your source, but you have to make sure, it does not load all your data in memory. 我不知道你的来源是什么,但你必须确保它不会将所有数据加载到内存中。

For example, assuming your data set come from another file where JSON objects are splitted with end of line character, you could create a Read stream as follow: 例如,假设您的数据集来自另一个文件,其中JSON对象使用行尾字符进行拆分,您可以创建一个读取流,如下所示:

const Readable = require('stream').Readable;
class JSONReader extends Readable {
constructor(options={}){
  super(options);
  this._source=options.source: // the source stream
  this._buffer='';
  source.on('readable', function() {
    this.read();
  }.bind(this));//read whenever the source is ready
}
_read(size){
   var chunk;
   var line;
   var lineIndex;
   var result;
   if (this._buffer.length === 0) {
     chunk = this._source.read(); // read more from source when buffer is empty
     this._buffer += chunk;
   }
   lineIndex = this._buffer.indexOf('\n'); // find end of line 
   if (lineIndex !== -1) { //we have a end of line and therefore a new object
      line = this._buffer.slice(0, lineIndex); // get the character related to the object
      if (line) {
        result = JSON.parse(line);
        this._buffer = this._buffer.slice(lineIndex + 1);
        this.push(JSON.stringify(line) // push to the internal read queue
      } else {
        this._buffer.slice(1)
      }
  }
}}

now you can use 现在你可以使用了

const source = fs.createReadStream('mySourceFile');
const reader = new JSONReader({source});
const target = fs.createWriteStream('myTargetFile');
reader.pipe(target);

then you'll have a better memory flow: 然后你会有更好的记忆流量:

同步与流内存管理

Please note that the picture and the above example are taken from the excellent nodejs in practice book 请注意,图片和上面的示例取自练习册中优秀的nodejs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM