简体   繁体   中英

Speed up node.js when operating with rows from large file

I want to read a file with many rows, then write results.

It's fine with small files <50kb.

But I've got 15MB file for programming competition - as a hard input.

Node.js become slow and I can't get the output in time, because I have to send them the output within few minutes. And it's even not using full CPU/RAM.

不占用太多资源

Is the problem in my code, or can I do something about it? Thanks!

const fs = require("fs");

const input = "D:\\Downloads\\example.txt";
const output = input + ".final.txt";

var lineReader = require("readline").createInterface({
  input: fs.createReadStream(input),
});
let out = "";
let all = [];
const line_counter = (
  (i = 0) =>
  () =>
    ++i
)();
lineReader.on("line", function (radek, index = line_counter()) {
  all.push(radek);
});
all.forEach((v) => {
  out += `${v}\n`;
});

fs.writeFile(output, out, (err) => {
  if (err) {
    console.error(err);
  }
});

It seems like you want a better understanding of how to do the following using a streaming technique:

  • read an input text file stream line-by-line
  • perform a transform operation on each line of text
  • write the result of each transform operation to an output file stream

Node supports the web standard streams API — see the list of global objects in the current LTS version of Node (18): https://nodejs.org/docs/latest-v18.x/api/globals.html .

Below, I'll include a complete, minimal example which demonstrates the criteria above — you can use it as a model for learning and adapt it to meet the needs of your program. Because your goal is learning, I've included verbose comments at every step of the program, including links to documentation.

You'll also probably find it helpful to read about the Streams API on MDN and web.dev .

module.mjs :

import {open} from 'node:fs/promises';
import {Writable} from 'node:stream';

// Break string chunks from a ReadableStream into lines. Adapted from Deno's std library:
// See: https://github.com/denoland/deno_std/blob/0.166.0/streams/delimiter.ts#L11-L68
class TextLineStream extends TransformStream {
  #buf = "";

  constructor() {
    super({
      transform: (chunk, controller) => this.#handle(chunk, controller),
      flush: (controller) => this.#handle("\r\n", controller),
    });
  }

  #handle(chunk, controller) {
    chunk = this.#buf + chunk;

    while (true) {
      const lfIndex = chunk.indexOf("\n");

      if (lfIndex !== -1) {
        let crOrLfIndex = lfIndex;
        if (chunk[lfIndex - 1] === "\r") {
          crOrLfIndex--;
        }
        controller.enqueue(chunk.slice(0, crOrLfIndex));
        chunk = chunk.slice(lfIndex + 1);
        continue;
      }

      break;
    }

    this.#buf = chunk;
  }
}

async function main () {
  // Paths based on your examples:
  const pathIn = 'example.txt';
  const pathOut = `${pathIn}.final.txt`;

  // Create file handles to the target file paths:
  // See: https://nodejs.org/docs/latest-v18.x/api/fs.html#fspromisesopenpath-flags-mode
  const fhIn = await open(pathIn);

  // The "w" flag means: Open file for writing. The file is created (if it does not exist) or truncated (if it exists).
  // See: https://nodejs.org/docs/latest-v18.x/api/fs.html#file-system-flags
  const fhOut = await open(pathOut, 'w');

  // Create a web-standard WritableStream from the output file handle:
  // See: https://nodejs.org/docs/latest-v18.x/api/stream.html#streamwritabletowebstreamwritable
  const writable = Writable.toWeb(fhOut.createWriteStream({encoding: 'utf8'}));
  const writer = writable.getWriter();

  // A function abstraction for writing a text chunk to the output file stream:
  const write = (text) => writer.ready.then(() => writer.write(text));

  // Crate a web-standard ReadableStream from the input file handle,
  // then pipe through a text decoder and break/collect the emitted chunks into lines:
  // See: https://nodejs.org/docs/latest-v18.x/api/fs.html#filehandlereadablewebstream
  const readable = fhIn.readableWebStream()
    .pipeThrough(new TextDecoderStream())
    .pipeThrough(new TextLineStream());

  for await (const line of readable) {
    // Handle each text line in here:

    // For example: get the length of each line,
    // and if it's greater than 0, write it as a line to the output stream:
    const {length} = line;
    if (length > 0) await write(`${length}\n`);
  }
}

main();

Here's the CLI output of using the program on an example text file with some Lorem ipsum lines:

% node --version
v18.12.1

% ls
example.txt module.mjs

% cat example.txt
lorem
ipsum
dolor
sit
amet

% node module.mjs

% cat example.txt.final.txt
5
5
5
3
4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM