简体   繁体   中英

Read line by line and push to queue, asynchronously in node.js

I'm new to the whole async / promise concept and trying to figure out how to optimize a lambda I'm writing in node.js to execute actions in parallel.

I'm reading line by line from a long file (about 100k rows) and I want to push a message to SQS for each row. I need to wait for all of those processes to return before the function finishes executing.

The code below works but it seems to read every line in the file before the inner promise that writes to the queue begins executing, then at the end it pushes all of the message to the queue because of the call to Promise.all

Is there a way to run these actions in parallel? The file read has to be sequential but I'd expect to see calls to the queue mixed in there.

exports.queueUpdates = async (filePath) => {
  return new Promise((resolve, reject) => {
    const rl = readline.createInterface({
      input: fs.createReadStream(filePath),
      crlfDelay: Infinity
    });

    var queuePromises = [];

    rl.on('line', (line) => {
      console.log("read line", line);
      var message = exports.queueMessageForLine(line); // This function returns the message to be sent to SQS as a JSON Object.
      if (message !== null) {
        console.log("Pushing message", message);
        queuePromises.push(
          sqs.sendMessage(message).promise()
          .then((result) => {
            console.log("Enqueued message", message, result);
            return message;
          })
          .catch((err) => {
            console.error("Failed adding message to queue", err);
            return message;
          })
        );
      }
    }).on('close', () => {
      console.log("file read");
      Promise.all(queuePromises).then((results) => {
        resolve(results)
      })
    }).on('error', (err) => {
      console.error(err, "Error in reading the file contents");
      reject();
    });
  });
};

The output I'd expect here if it was doing what I want would be something like:

read line, line 1
read line, line 2
Enqueued message, message 1
read line, line 3
Enqueued message, message 2
Enqueued message, message 3
etc.

All mixed together.

After more investigation this is actually behaving the way I wanted. The issue was what James noted in the comments - readline pulls in multiple lines in bulk, then dispatches events. When testing w/ a smaller file locally it just didn't have enough content to go beyond a single read - resulting in a sequential execution. After pushing it up to AWS and reading the full 100k+ line file it clearly executes in the right order.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM