I'm processing a very large amount of data that I'm manipulating and storing it in a file. I iterate over the dataset, then I want to store it all in a JSON file.
My initial method using fs, storing it all in an object then dumping it didn't work as I was running out of memory and it became extremely slow.
I'm now using fs.createWriteStream but as far as I can tell it's still storing it all in memory.
I want the data to be written object by object to the file, unless someone can recommend a better way of doing it.
Part of my code:
// Top of the file
var wstream = fs.createWriteStream('mydata.json');
...
// In a loop
let JSONtoWrite = {}
JSONtoWrite[entry.word] = wordData
wstream.write(JSON.stringify(JSONtoWrite))
...
// Outside my loop (when memory is probably maxed out)
wstream.end()
I think I'm using Streams wrong, can someone tell me how to write all this data to a file without running out of memory? Every example I find online relates to reading a stream in but because of the calculations I'm doing on the data, I can't use a readable stream. I need to add to this file sequentially.
The problem is that you're not waiting for the data to be flushed to the filesystem, but instead keep throwing new and new data to the stream synchronously in a tight loop.
Here's an piece of pseudocode that should work for you:
// Top of the file
const wstream = fs.createWriteStream('mydata.json');
// I'm no sure how're you getting the data, let's say you have it all in an object
const entry = {};
const words = Object.keys(entry);
function writeCB(index) {
if (index >= words.length) {
wstream.end()
return;
}
const JSONtoWrite = {};
JSONtoWrite[words[index]] = entry[words[index]];
wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(index + 1));
}
wstream.write(JSON.stringify(JSONtoWrite), writeCB.bind(0));
You should wrap your data source in a readable stream too. I don't know what is your source, but you have to make sure, it does not load all your data in memory.
For example, assuming your data set come from another file where JSON objects are splitted with end of line character, you could create a Read stream as follow:
const Readable = require('stream').Readable;
class JSONReader extends Readable {
constructor(options={}){
super(options);
this._source=options.source: // the source stream
this._buffer='';
source.on('readable', function() {
this.read();
}.bind(this));//read whenever the source is ready
}
_read(size){
var chunk;
var line;
var lineIndex;
var result;
if (this._buffer.length === 0) {
chunk = this._source.read(); // read more from source when buffer is empty
this._buffer += chunk;
}
lineIndex = this._buffer.indexOf('\n'); // find end of line
if (lineIndex !== -1) { //we have a end of line and therefore a new object
line = this._buffer.slice(0, lineIndex); // get the character related to the object
if (line) {
result = JSON.parse(line);
this._buffer = this._buffer.slice(lineIndex + 1);
this.push(JSON.stringify(line) // push to the internal read queue
} else {
this._buffer.slice(1)
}
}
}}
now you can use
const source = fs.createReadStream('mySourceFile');
const reader = new JSONReader({source});
const target = fs.createWriteStream('myTargetFile');
reader.pipe(target);
then you'll have a better memory flow:
Please note that the picture and the above example are taken from the excellent nodejs in practice book
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.