简体   繁体   中英

Reading a file line by line and performing analyzes in node.js?

I have a file, and I want to read it line by line, and for every line extracted I perform some expensive analyzes and then save the results to the database. In short, I have something like this:

const fs = require('fs');
const path = require('path');
const readline = require('readline');

async function analyzeAndSave(url) {
  // Removed for brevity, but this function takes a minute or so finsh.
}

async function run() {
  try {
    const dataPath  = path.join(path.dirname(require.main.filename), 'data/urls.txt');

    const rl = readline.createInterface({
      input: fs.createReadStream(dataPath),
    });

    let line_no = 0;
    rl.on('line', async (url) => {
      line_no++;
      logger.info(`Analyzing: ${url}`);
      await analyzeAndSave(url);
    });
  } catch (err) {
    // Error caught.
    logger.error(err);
  }
}

run();

The problem with this is that, I notice that it doesn't wait for the analyzes of one line to finish, it kind of tries to execute multiple of the analyzes instances. I can see this as initially it prints all the lines with logger.info('Analyzing: ' + url );`. So, it is not executed sequentially. How can I make sure that one line finishes before moving onto the next?

I think this is going to be helpful to you, exampled and mentioned here.

Nodejs - read line by line from file, perform async action for each line and reusme

Someone stated you can use a library for big files which is titled: line-by-line

@JavierFerrero stated a solution as such.

var LineByLineReader = require('line-by-line'),
    lr = new LineByLineReader('big_file.txt');

lr.on('error', function (err) {
    // 'err' contains error object
});

lr.on('line', function (line) {
    // pause emitting of lines...
    lr.pause();

    // ...do your asynchronous line processing..
    setTimeout(function () {

        // ...and continue emitting lines.
        lr.resume();
    }, 100);
});

lr.on('end', function () {
    // All lines are read, file is closed now.
});

You can also pass it ass a callback, waiting for the operation to finish.

const fs = require('fs');

function run(path, cb) {
    try {
        fs.readFile(path, 'utf8', function(err, data){
            if(err) throw err;
            cb(data);
        });
    } catch (err) {
        // Error caught.
    }
}

run('./test.txt', (response) => {
    // We are done, now continue
    console.log(response)
})

The readline interface is emitting the "on" events asynchronously and doing an await inside one of them doesn't stop other from being fired. Instead you can buffer up the lines in an array like this:

r.on('line', url => urls.push(url));
r.on('close', async () => {
  for (const url of urls) {
    await analyzeAndSave(url);
  }
});

where urls is initialized to an empty array before the readline interface is created.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM