简体   繁体   中英

Node.js: fs writestream stops writing to file when the file gets too big

I'm scraping a webpage with lots of data on it, formatted as an HTML table. You have to submit a form to generate the table. My node script submits all the permutations of the form, and each time scrapes the resulting table, turning each row into a line of data.

The problem is, when I write the data to some file, it stops working when the file gets to be about 10MB in size. Sometimes it's a little less; sometimes a little more. I have tried writing the file as .csv, .json, and .txt, and each time the same problem occurs.

I am using fs to perform this task. The relevant code is:

var fs = require("fs");
var stream = fs.createWriteStream("data.csv"); // can also be .json or .txt

stream.write(line_of_data);

I can console.log(line_of_data) and it works fine, all the way through until there's no data left to scrape. But at about 10MB, the output file won't accept any more lines of data. The stopping point seems almost completely arbitrary -- every time I run the script, it stops writing at a different point. I have plently of storage space on my hard drive, so the problem must have to do with something else.

I ended up using MongoDB to store the data. To install MongoDB as a node module, run npm install mongodb --save . The relevant javascript is:

MongoClient = require("mongodb").MongoClient;
MongoClient.connect("mongodb://localhost:27017/database", function(err, db) {

  if (!err) {

    // set up mongodb collection
    db.createCollection("collection", function(err, collection) {}); 
    var collection = db.collection("collection");

    // after scraping data... 
    // insert a data object (line_of_code)
    collection.insert(line_of_data, {w: 1}, function(err, result) {
      if (err) console.log(err);
    });

  }

});

Some commands to convert the data:

  1. Export as CSV: mongoexport --db database --collection collection --out data.csv --type=csv --fields 'field1, field2, field3, etc.'
  2. Convert to JSON: csvtojson data.csv > data.json (requires csvtojson )
  3. Validate JSON: jsonlint data.json (requires jsonlint )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM