简体   繁体   English

在 node.js 中一次读取一行文件?

[英]Read a file one line at a time in node.js?

I am trying to read a large file one line at a time.我正在尝试一次读取一个大文件。 I found a question on Quora that dealt with the subject but I'm missing some connections to make the whole thing fit together.在 Quora 上发现了一个涉及该主题的问题,但我缺少一些联系以使整个事情融合在一起。

 var Lazy=require("lazy");
 new Lazy(process.stdin)
     .lines
     .forEach(
          function(line) { 
              console.log(line.toString()); 
          }
 );
 process.stdin.resume();

The bit that I'd like to figure out is how I might read one line at a time from a file instead of STDIN as in this sample.我想弄清楚的一点是,我如何一次从文件中读取一行,而不是像本示例中那样从 STDIN 读取。

I tried:我试过了:

 fs.open('./VeryBigFile.csv', 'r', '0666', Process);

 function Process(err, fd) {
    if (err) throw err;
    // DO lazy read 
 }

but it's not working.但它不起作用。 I know that in a pinch I could fall back to using something like PHP, but I would like to figure this out.我知道在紧要关头我可以回退到使用 PHP 之类的东西,但我想弄清楚这一点。

I don't think the other answer would work as the file is much larger than the server I'm running it on has memory for.我不认为其他答案会起作用,因为该文件比我运行它的服务器大得多,它有 memory 。

Since Node.js v0.12 and as of Node.js v4.0.0, there is a stable readline core module.从 Node.js v0.12 和 Node.js v4.0.0 开始,有一个稳定的readline核心模块。 Here's the easiest way to read lines from a file, without any external modules:这是从文件中读取行的最简单方法,无需任何外部模块:

const fs = require('fs');
const readline = require('readline');

async function processLineByLine() {
  const fileStream = fs.createReadStream('input.txt');

  const rl = readline.createInterface({
    input: fileStream,
    crlfDelay: Infinity
  });
  // Note: we use the crlfDelay option to recognize all instances of CR LF
  // ('\r\n') in input.txt as a single line break.

  for await (const line of rl) {
    // Each line in input.txt will be successively available here as `line`.
    console.log(`Line from file: ${line}`);
  }
}

processLineByLine();

Or alternatively:或者:

var lineReader = require('readline').createInterface({
  input: require('fs').createReadStream('file.in')
});

lineReader.on('line', function (line) {
  console.log('Line from file:', line);
});

The last line is read correctly (as of Node v0.12 or later), even if there is no final \n .最后一行被正确读取(从 Node v0.12 或更高版本开始),即使没有最终的\n

UPDATE : this example has been added to Node's API official documentation .更新:此示例已添加到 Node 的 API 官方文档中。

For such a simple operation there shouldn't be any dependency on third-party modules.对于这样一个简单的操作,不应该对第三方模块有任何依赖。 Go easy. Go 容易。

var fs = require('fs'),
    readline = require('readline');

var rd = readline.createInterface({
    input: fs.createReadStream('/path/to/file'),
    output: process.stdout,
    console: false
});

rd.on('line', function(line) {
    console.log(line);
});

You don't have to open the file, but instead, you have to create a ReadStream .您不必open文件,而是必须创建一个ReadStream

fs.createReadStream

Then pass that stream to Lazy然后将 stream 传递给Lazy

require('fs').readFileSync('file.txt', 'utf-8').split(/\r?\n/).forEach(function(line){
  console.log(line);
})

Update in 2019 2019年更新

An awesome example is already posted on official Nodejs documentation.一个很棒的例子已经发布在官方的 Nodejs 文档上。 here 这里

This requires the latest Nodejs is installed on your machine.这需要在您的机器上安装最新的 Nodejs。 >11.4 >11.4

const fs = require('fs');
const readline = require('readline');

async function processLineByLine() {
  const fileStream = fs.createReadStream('input.txt');

  const rl = readline.createInterface({
    input: fileStream,
    crlfDelay: Infinity
  });
  // Note: we use the crlfDelay option to recognize all instances of CR LF
  // ('\r\n') in input.txt as a single line break.

  for await (const line of rl) {
    // Each line in input.txt will be successively available here as `line`.
    console.log(`Line from file: ${line}`);
  }
}

processLineByLine();

there is a very nice module for reading a file line by line, it's called line-reader有一个非常好的模块可以逐行读取文件,它被称为line-reader

with it you simply just write:有了它,你只需写:

var lineReader = require('line-reader');

lineReader.eachLine('file.txt', function(line, last) {
  console.log(line);
  // do whatever you want with line...
  if(last){
    // or check if it's the last one
  }
});

you can even iterate the file with a "java-style" interface, if you need more control:如果您需要更多控制,您甚至可以使用“java-style”接口迭代文件:

lineReader.open('file.txt', function(reader) {
  if (reader.hasNextLine()) {
    reader.nextLine(function(line) {
      console.log(line);
    });
  }
});

Old topic, but this works:老话题,但这有效:

var rl = readline.createInterface({
      input : fs.createReadStream('/path/file.txt'),
      output: process.stdout,
      terminal: false
})
rl.on('line',function(line){
     console.log(line) //or parse line
})

Simple.简单的。 No need for an external module.无需外部模块。

You can always roll your own line reader.您可以随时使用自己的线路阅读器。 I have'nt benchmarked this snippet yet, but it correctly splits the incoming stream of chunks into lines without the trailing '\n'我还没有对这个片段进行基准测试,但它正确地将传入的 stream 块分割成没有尾随 '\n' 的行

var last = "";

process.stdin.on('data', function(chunk) {
    var lines, i;

    lines = (last+chunk).split("\n");
    for(i = 0; i < lines.length - 1; i++) {
        console.log("line: " + lines[i]);
    }
    last = lines[i];
});

process.stdin.on('end', function() {
    console.log("line: " + last);
});

process.stdin.resume();

I did come up with this when working on a quick log parsing script that needed to accumulate data during the log parsing and I felt that it would nice to try doing this using js and node instead of using perl or bash.在处理需要在日志解析期间累积数据的快速日志解析脚本时,我确实想出了这个问题,我觉得尝试使用 js 和节点而不是使用 perl 或 bash 会很好。

Anyway, I do feel that small nodejs scripts should be self contained and not rely on third party modules so after reading all the answers to this question, each using various modules to handle line parsing, a 13 SLOC native nodejs solution might be of interest.无论如何,我确实觉得小型 nodejs 脚本应该是自包含的,而不是依赖第三方模块,所以在阅读了这个问题的所有答案之后,每个使用各种模块来处理行解析,一个 13 SLOC 本机 nodejs 解决方案可能会感兴趣。

With the carrier module :使用载体模块

var carrier = require('carrier');

process.stdin.resume();
carrier.carry(process.stdin, function(line) {
    console.log('got one line: ' + line);
});

I ended up with a massive, massive memory leak using Lazy to read line by line when trying to then process those lines and write them to another stream due to the way drain/pause/resume in node works (see: http://elegantcode.com/2011/04/06/taking-baby-steps-with-node-js-pumping-data-between-streams/ (i love this guy btw)).由于节点工作方式的排水/暂停/恢复方式,我使用 Lazy 逐行读取,然后尝试处理这些行并将它们写入另一个stream .com/2011/04/06/taking-baby-steps-with-node-js-pumping-data-between-streams/ (顺便说一句,我喜欢这个人))。 I haven't looked closely enough at Lazy to understand exactly why, but I couldn't pause my read stream to allow for a drain without Lazy exiting.我没有仔细研究 Lazy 以了解确切原因,但我无法暂停阅读 stream 以允许在没有 Lazy 退出的情况下进行排水。

I wrote the code to process massive csv files into xml docs, you can see the code here: https://github.com/j03m/node-csv2xml我编写了将大量 csv 文件处理为 xml 文档的代码,您可以在此处查看代码: https://github.com/j03m/node-csv

If you run the previous revisions with Lazy line it leaks.如果您使用 Lazy line 运行以前的修订版,它会泄漏。 The latest revision doesn't leak at all and you can probably use it as the basis for a reader/processor.最新版本根本没有泄漏,您可能可以将其用作阅读器/处理器的基础。 Though I have some custom stuff in there.虽然我有一些定制的东西在那里。

Edit: I guess I should also note that my code with Lazy worked fine until I found myself writing large enough xml fragments that drain/pause/resume because a necessity.编辑:我想我还应该注意,我的 Lazy 代码运行良好,直到我发现自己编写了足够大的 xml 碎片,因为必要时会耗尽/暂停/恢复。 For smaller chunks it was fine.对于较小的块,这很好。

Edit:编辑:

Use a transform stream .使用变换 stream


With a BufferedReader you can read lines.使用BufferedReader ,您可以读取行。

new BufferedReader ("lorem ipsum", { encoding: "utf8" })
    .on ("error", function (error){
        console.log ("error: " + error);
    })
    .on ("line", function (line){
        console.log ("line: " + line);
    })
    .on ("end", function (){
        console.log ("EOF");
    })
    .read ();

Since posting my original answer, I found that split is a very easy to use node module for line reading in a file;自从发布我的原始答案以来,我发现split是一个非常易于使用的节点模块,用于在文件中读取行; Which also accepts optional parameters.它也接受可选参数。

var split = require('split');
fs.createReadStream(file)
    .pipe(split())
    .on('data', function (line) {
      //each chunk now is a seperate line! 
    });

Haven't tested on very large files.尚未对非常大的文件进行测试。 Let us know if you do.如果您这样做,请告诉我们。

In most cases this should be enough:在大多数情况下,这应该足够了:

const fs = require("fs")

fs.readFile('./file', 'utf-8', (err, file) => {
  const lines = file.split('\n')

  for (let line of lines)
    console.log(line)
});

I was frustrated by the lack of a comprehensive solution for this, so I put together my own attempt ( git / npm ).我对缺乏一个全面的解决方案感到沮丧,所以我把我自己的尝试放在一起( git / npm )。 Copy-pasted list of features:复制粘贴的功能列表:

  • Interactive line processing (callback-based, no loading the entire file into RAM)交互式行处理(基于回调,不将整个文件加载到 RAM 中)
  • Optionally, return all lines in an array (detailed or raw mode) (可选)返回数组中的所有行(详细或原始模式)
  • Interactively interrupt streaming, or perform map/filter like processing交互式中断流式传输,或执行类似映射/过滤的处理
  • Detect any newline convention (PC/Mac/Linux)检测任何换行约定(PC/Mac/Linux)
  • Correct eof / last line treatment正确的 eof/最后一行处理
  • Correct handling of multi-byte UTF-8 characters正确处理多字节 UTF-8 字符
  • Retrieve byte offset and byte length information on per-line basis检索每行的字节偏移和字节长度信息
  • Random access, using line-based or byte-based offsets随机访问,使用基于行或基于字节的偏移量
  • Automatically map line-offset information, to speed up random access自动获取 map 线偏移信息,加快随机访问
  • Zero dependencies零依赖
  • Tests测试

NIH? NIH? You decide:-)你决定:-)

I wanted to tackle this same problem, basically what in Perl would be:我想解决同样的问题,基本上 Perl 中的内容是:

while (<>) {
    process_line($_);
}

My use case was just a standalone script, not a server, so synchronous was fine.我的用例只是一个独立的脚本,而不是服务器,所以同步很好。 These were my criteria:这些是我的标准:

  • The minimal synchronous code that could reuse in many projects.可以在许多项目中重用的最小同步代码。
  • No limits on file size or number of lines.文件大小或行数没有限制。
  • No limits on length of lines.对行的长度没有限制。
  • Able to handle full Unicode in UTF-8, including characters beyond the BMP.能够处理 UTF-8 中的完整 Unicode,包括 BMP 之外的字符。
  • Able to handle *nix and Windows line endings (old-style Mac not needed for me).能够处理 *nix 和 Windows 行尾(我不需要老式 Mac)。
  • Line endings character(s) to be included in lines.要包含在行中的行尾字符。
  • Able to handle last line with or without end-of-line characters.能够处理带有或不带有行尾字符的最后一行。
  • Not use any external libraries not included in the node.js distribution.不要使用 node.js 发行版中未包含的任何外部库。

This is a project for me to get a feel for low-level scripting type code in node.js and decide how viable it is as a replacement for other scripting languages like Perl.这是一个让我感受 node.js 中的低级脚本类型代码的项目,并确定它作为 Perl 等其他脚本语言的替代品的可行性。

After a surprising amount of effort and a couple of false starts this is the code I came up with.经过惊人的努力和几次错误的开始,这就是我想出的代码。 It's pretty fast but less trivial than I would've expected: (fork it on GitHub)它非常快,但比我预期的要简单:(在 GitHub 上 fork)

var fs            = require('fs'),
    StringDecoder = require('string_decoder').StringDecoder,
    util          = require('util');

function lineByLine(fd) {
  var blob = '';
  var blobStart = 0;
  var blobEnd = 0;

  var decoder = new StringDecoder('utf8');

  var CHUNK_SIZE = 16384;
  var chunk = new Buffer(CHUNK_SIZE);

  var eolPos = -1;
  var lastChunk = false;

  var moreLines = true;
  var readMore = true;

  // each line
  while (moreLines) {

    readMore = true;
    // append more chunks from the file onto the end of our blob of text until we have an EOL or EOF
    while (readMore) {

      // do we have a whole line? (with LF)
      eolPos = blob.indexOf('\n', blobStart);

      if (eolPos !== -1) {
        blobEnd = eolPos;
        readMore = false;

      // do we have the last line? (no LF)
      } else if (lastChunk) {
        blobEnd = blob.length;
        readMore = false;

      // otherwise read more
      } else {
        var bytesRead = fs.readSync(fd, chunk, 0, CHUNK_SIZE, null);

        lastChunk = bytesRead !== CHUNK_SIZE;

        blob += decoder.write(chunk.slice(0, bytesRead));
      }
    }

    if (blobStart < blob.length) {
      processLine(blob.substring(blobStart, blobEnd + 1));

      blobStart = blobEnd + 1;

      if (blobStart >= CHUNK_SIZE) {
        // blobStart is in characters, CHUNK_SIZE is in octets
        var freeable = blobStart / CHUNK_SIZE;

        // keep blob from growing indefinitely, not as deterministic as I'd like
        blob = blob.substring(CHUNK_SIZE);
        blobStart -= CHUNK_SIZE;
        blobEnd -= CHUNK_SIZE;
      }
    } else {
      moreLines = false;
    }
  }
}

It could probably be cleaned up further, it was the result of trial and error.它可能可以进一步清理,这是反复试验的结果。

function createLineReader(fileName){
    var EM = require("events").EventEmitter
    var ev = new EM()
    var stream = require("fs").createReadStream(fileName)
    var remainder = null;
    stream.on("data",function(data){
        if(remainder != null){//append newly received data chunk
            var tmp = new Buffer(remainder.length+data.length)
            remainder.copy(tmp)
            data.copy(tmp,remainder.length)
            data = tmp;
        }
        var start = 0;
        for(var i=0; i<data.length; i++){
            if(data[i] == 10){ //\n new line
                var line = data.slice(start,i)
                ev.emit("line", line)
                start = i+1;
            }
        }
        if(start<data.length){
            remainder = data.slice(start);
        }else{
            remainder = null;
        }
    })

    stream.on("end",function(){
        if(null!=remainder) ev.emit("line",remainder)
    })

    return ev
}


//---------main---------------
fileName = process.argv[2]

lineReader = createLineReader(fileName)
lineReader.on("line",function(line){
    console.log(line.toString())
    //console.log("++++++++++++++++++++")
})

Generator based line reader: https://github.com/neurosnap/gen-readlines基于生成器的线阅读器: https://github.com/neurosnap/gen-readlines

var fs = require('fs');
var readlines = require('gen-readlines');

fs.open('./file.txt', 'r', function(err, fd) {
  if (err) throw err;
  fs.fstat(fd, function(err, stats) {
    if (err) throw err;

    for (var line of readlines(fd, stats.size)) {
      console.log(line.toString());
    }

  });
});

If you want to read a file line by line and writing this in another:如果您想逐行读取文件并将其写入另一个文件:

var fs = require('fs');
var readline = require('readline');
var Stream = require('stream');

function readFileLineByLine(inputFile, outputFile) {

   var instream = fs.createReadStream(inputFile);
   var outstream = new Stream();
   outstream.readable = true;
   outstream.writable = true;

   var rl = readline.createInterface({
      input: instream,
      output: outstream,
      terminal: false
   });

   rl.on('line', function (line) {
        fs.appendFileSync(outputFile, line + '\n');
   });
};
var fs = require('fs');

function readfile(name,online,onend,encoding) {
    var bufsize = 1024;
    var buffer = new Buffer(bufsize);
    var bufread = 0;
    var fd = fs.openSync(name,'r');
    var position = 0;
    var eof = false;
    var data = "";
    var lines = 0;

    encoding = encoding || "utf8";

    function readbuf() {
        bufread = fs.readSync(fd,buffer,0,bufsize,position);
        position += bufread;
        eof = bufread ? false : true;
        data += buffer.toString(encoding,0,bufread);
    }

    function getLine() {
        var nl = data.indexOf("\r"), hasnl = nl !== -1;
        if (!hasnl && eof) return fs.closeSync(fd), online(data,++lines), onend(lines); 
        if (!hasnl && !eof) readbuf(), nl = data.indexOf("\r"), hasnl = nl !== -1;
        if (!hasnl) return process.nextTick(getLine);
        var line = data.substr(0,nl);
        data = data.substr(nl+1);
        if (data[0] === "\n") data = data.substr(1);
        online(line,++lines);
        process.nextTick(getLine);
    }
    getLine();
}

I had the same problem and came up with above solution looks simular to others but is aSync and can read large files very quickly我遇到了同样的问题,并提出了上述解决方案,对其他人来说看起来很相似,但它是 aSync 并且可以非常快速地读取大文件

Hopes this helps希望这会有所帮助

Two questions we must ask ourselves while doing such operations are:在进行此类操作时,我们必须问自己两个问题:

  1. What's the amount of memory used to perform it?用于执行它的 memory 的数量是多少?
  2. Is the memory consumption increasing drastically with the file size? memory 消耗是否随文件大小急剧增加?

Solutions like require('fs').readFileSync() loads the whole file into memory.require('fs').readFileSync()这样的解决方案将整个文件加载到 memory 中。 That means that the amount of memory required to perform operations will be almost equivalent to the file size.这意味着执行操作所需的 memory 的数量几乎等于文件大小。 We should avoid these for anything larger than 50mbs对于大于50mbs的任何东西,我们应该避免使用这些

We can easily track the amount of memory used by a function by placing these lines of code after the function invocation:通过在 function 调用之后放置这些代码行,我们可以轻松地跟踪function 使用的 memory 的数量:

    const used = process.memoryUsage().heapUsed / 1024 / 1024;
    console.log(
      `The script uses approximately ${Math.round(used * 100) / 100} MB`
    );

Right now the best way to read particular lines from a large file is using node's readline .现在,从大文件中读取特定行的最佳方法是使用节点的readline The documentation has an amazing examples .该文档有一个惊人的例子

Although we don't need any third-party module to do it.虽然我们不需要任何第三方模块来做到这一点。 But, If you are writing an enterprise code, you have to handle lots of edge cases.但是,如果您正在编写企业代码,则必须处理许多边缘情况。 I had to write a very lightweight module called Apick File Storage to handle all those edge cases.我必须编写一个名为 Apick File Storage 的非常轻量级的模块来处理所有这些边缘情况。

Apick File Storage module: https://www.npmjs.com/package/apickfs Documentation: https://github.com/apickjs/apickFS#readme Apick 文件存储模块: https://www.npmjs.com/package/apickfs文档: https://github.com/apickjs/apickFS#readme

Example file: https://1drv.ms/t/s!AtkMCsWInsSZiGptXYAFjalXOpUx示例文件: https://1drv.ms/t/s!AtkMCsWInsSZiGptXYAFjalXOpUx

Example: Install module示例:安装模块

npm i apickfs
// import module
const apickFileStorage = require('apickfs');
//invoke readByLineNumbers() method
apickFileStorage
  .readByLineNumbers(path.join(__dirname), 'big.txt', [163845])
  .then(d => {
    console.log(d);
  })
  .catch(e => {
    console.log(e);
  });

This method was successfully tested with up to 4 GB dense files.此方法已成功测试高达 4 GB 的密集文件。

big.text is a dense text file with 163,845 lines and is of 124 Mb. big.text 是一个密集的文本文件,有 163,845 行,大小为 124 Mb。 The script to read 10 different lines from this file uses approximately just 4.63 MB Memory only.从该文件中读取 10 个不同行的脚本仅使用大约 4.63 MB Memory。 And it parses valid JSON to Objects or Arrays for free.它免费将有效的 JSON 解析为 Objects 或 Arrays。 Awesome!!惊人的!!

We can read a single line of the file or hundreds of lines of the file with very little memory consumption.我们可以用很少的 memory 消耗读取单行文件或数百行文件。

I have a little module which does this well and is used by quite a few other projects npm readline Note thay in node v10 there is a native readline module so I republished my module as linebyline https://www.npmjs.com/package/linebyline我有一个小模块,它做得很好,并被很多其他项目使用 npm readline注意节点 v10 中有一个本机 readline 模块,所以我将我的模块重新发布为linebyline https://www.npmjs.com/package/逐行

if you dont want to use the module the function is very simple:如果您不想使用该模块 function 非常简单:

var fs = require('fs'),
EventEmitter = require('events').EventEmitter,
util = require('util'),
newlines = [
  13, // \r
  10  // \n
];
var readLine = module.exports = function(file, opts) {
if (!(this instanceof readLine)) return new readLine(file);

EventEmitter.call(this);
opts = opts || {};
var self = this,
  line = [],
  lineCount = 0,
  emit = function(line, count) {
    self.emit('line', new Buffer(line).toString(), count);
  };
  this.input = fs.createReadStream(file);
  this.input.on('open', function(fd) {
    self.emit('open', fd);
  })
  .on('data', function(data) {
   for (var i = 0; i < data.length; i++) {
    if (0 <= newlines.indexOf(data[i])) { // Newline char was found.
      lineCount++;
      if (line.length) emit(line, lineCount);
      line = []; // Empty buffer.
     } else {
      line.push(data[i]); // Buffer new line data.
     }
   }
 }).on('error', function(err) {
   self.emit('error', err);
 }).on('end', function() {
  // Emit last line if anything left over since EOF won't trigger it.
  if (line.length){
     lineCount++;
     emit(line, lineCount);
  }
  self.emit('end');
 }).on('close', function() {
   self.emit('close');
 });
};
util.inherits(readLine, EventEmitter);

Another solution is to run logic via sequential executor nsynjs .另一种解决方案是通过顺序执行器 nsynjs运行逻辑。 It reads file line-by-line using node readline module, and it doesn't use promises or recursion, therefore not going to fail on large files.它使用 node readline 模块逐行读取文件,并且不使用承诺或递归,因此不会在大文件上失败。 Here is how the code will looks like:代码如下所示:

var nsynjs = require('nsynjs');
var textFile = require('./wrappers/nodeReadline').textFile; // this file is part of nsynjs

function process(textFile) {

    var fh = new textFile();
    fh.open('path/to/file');
    var s;
    while (typeof(s = fh.readLine(nsynjsCtx).data) != 'undefined')
        console.log(s);
    fh.close();
}

var ctx = nsynjs.run(process,{},textFile,function () {
    console.log('done');
});

Code above is based on this exampe: https://github.com/amaksr/nsynjs/blob/master/examples/node-readline/index.js上面的代码基于此示例: https://github.com/amaksr/nsynjs/blob/master/examples/node-readline/index.js

This is my favorite way of going through a file, a simple native solution for a progressive (as in not a "slurp" or all-in-memory way) file read with modern async/await .这是我最喜欢的文件浏览方式,一种简单的本地解决方案,用于使用现代async/await读取的渐进式(如不是“slurp”或全内存方式)文件。 It's a solution that I find "natural" when processing large text files without having to resort to the readline package or any non-core dependency.这是我在处理大型文本文件时发现“自然”的解决方案,而无需求助于readline package 或任何非核心依赖项。

let buf = '';
for await ( const chunk of fs.createReadStream('myfile') ) {
    const lines = buf.concat(chunk).split(/\r?\n/);
    buf = lines.pop();
    for( const line of lines ) {
        console.log(line);
    }
}
if(buf.length) console.log(buf);  // last line, if file does not end with newline

You can adjust encoding in the fs.createReadStream or use chunk.toString(<arg>) .您可以在fs.createReadStream中调整编码或使用chunk.toString(<arg>) Also this let's you better fine-tune the line splitting to your taste, ie.这也让你更好地根据你的喜好微调分割线,即。 use .split(/\n+/) to skip empty lines and control the chunk size with { highWaterMark: <chunkSize> } .使用.split(/\n+/)跳过空行并使用{ highWaterMark: <chunkSize> }控制块大小。

Don't forget to create a function like processLine(line) to avoid repeating the line processing code twice due to the ending buf leftover.不要忘记创建一个类似 processLine processLine(line)的 function 以避免由于结束buf剩余而重复行处理代码两次。 Unfortunately, the ReadStream instance does not update its end-of-file flags in this setup, so there's no way, afaik, to detect within the loop that we're in the last iteration without some more verbose tricks like comparing the file size from a fs.Stats() with .bytesRead .不幸的是, ReadStream实例在这个设置中没有更新它的文件结束标志,所以没有办法,afaik,没有办法在循环中检测到我们在最后一次迭代中没有一些更冗长的技巧,比如比较文件大小带有.bytesReadfs.Stats() Hence the final buf processing solution, unless you're absolutely sure your file ends with a newline \n , in which case the for await loop should suffice.因此,最终的buf处理解决方案,除非您绝对确定您的文件以换行符\n结尾,在这种情况下for await循环就足够了。

★ If you prefer the evented asynchronous version, this would be it: ★ 如果你更喜欢事件异步版本,那就是:

let buf = '';
fs.createReadStream('myfile')
.on('data', chunk => {
    const lines = buf.concat(chunk).split(/\r?\n/);
    buf = lines.pop();
    for( const line of lines ) {
        console.log(line);
    }
})
.on('end', () => buf.length && console.log(buf) );

★ Now if you don't mind importing the stream core package, then this is the equivalent piped stream version, which allows for chaining transforms like gzip decompression: ★ 现在,如果您不介意导入stream内核 package,那么这是等效的管道 stream 版本,它允许链式解压缩,如 gzip 解压缩:

const { Writable } = require('stream');
let buf = '';
fs.createReadStream('myfile').pipe(
    new Writable({
        write: (chunk, enc, next) => {
            const lines = buf.concat(chunk).split(/\r?\n/);
            buf = lines.pop();
            for (const line of lines) {
                console.log(line);
            }
            next();
        }
    })
).on('finish', () => buf.length && console.log(buf) );

i use this:我用这个:

function emitLines(stream, re){
    re = re && /\n/;
    var buffer = '';

    stream.on('data', stream_data);
    stream.on('end', stream_end);

    function stream_data(data){
        buffer += data;
        flush();
    }//stream_data

    function stream_end(){
        if(buffer) stream.emmit('line', buffer);
    }//stream_end


    function flush(){
        var re = /\n/;
        var match;
        while(match = re.exec(buffer)){
            var index = match.index + match[0].length;
            stream.emit('line', buffer.substring(0, index));
            buffer = buffer.substring(index);
            re.lastIndex = 0;
        }
    }//flush

}//emitLines

use this function on a stream and listen to the line events that is will emit.在 stream 上使用此 function 并监听将发出的线路事件。

gr- gr-

While you should probably use the readline module as the top answer suggests, readline appears to be oriented toward command line interfaces rather than line reading.虽然您可能应该按照最佳答案的建议使用readline模块,但readline似乎面向命令行界面而不是行阅读。 It's also a little bit more opaque regarding buffering.关于缓冲,它也有点不透明。 (Anyone who needs a streaming line oriented reader probably will want to tweak buffer sizes). (任何需要面向流线的阅读器的人都可能想要调整缓冲区大小)。 The readline module is ~1000 lines while this, with stats and tests, is 34. readline 模块大约是 1000 行,而带有统计和测试的这个模块是 34 行。

const EventEmitter = require('events').EventEmitter;
class LineReader extends EventEmitter{
    constructor(f, delim='\n'){
        super();
        this.totalChars = 0;
        this.totalLines = 0;
        this.leftover = '';

        f.on('data', (chunk)=>{
            this.totalChars += chunk.length;
            let lines = chunk.split(delim);
            if (lines.length === 1){
                this.leftover += chunk;
                return;
            }
            lines[0] = this.leftover + lines[0];
            this.leftover = lines[lines.length-1];
            if (this.leftover) lines.pop();
            this.totalLines += lines.length;
            for (let l of lines) this.onLine(l);
        });
        // f.on('error', ()=>{});
        f.on('end', ()=>{console.log('chars', this.totalChars, 'lines', this.totalLines)});
    }
    onLine(l){
        this.emit('line', l);
    }
}
//Command line test
const f = require('fs').createReadStream(process.argv[2], 'utf8');
const delim = process.argv[3];
const lineReader = new LineReader(f, delim);
lineReader.on('line', (line)=> console.log(line));

Here's an even shorter version, without the stats, at 19 lines:这是一个更短的版本,没有统计信息,只有 19 行:

class LineReader extends require('events').EventEmitter{
    constructor(f, delim='\n'){
        super();
        this.leftover = '';
        f.on('data', (chunk)=>{
            let lines = chunk.split(delim);
            if (lines.length === 1){
                this.leftover += chunk;
                return;
            }
            lines[0] = this.leftover + lines[0];
            this.leftover = lines[lines.length-1];
            if (this.leftover) 
                lines.pop();
            for (let l of lines)
                this.emit('line', l);
        });
    }
}
const fs = require("fs")

fs.readFile('./file', 'utf-8', (err, data) => {
var innerContent;
    console.log("Asynchronous read: " + data.toString());
    const lines = data.toString().split('\n')
    for (let line of lines)
        innerContent += line + '<br>';


});

I wrap the whole logic of daily line processing as a npm module: line-kit https://www.npmjs.com/package/line-kit我将日常线路处理的整个逻辑包装为 npm 模块: line-kit https://www.npmjs.com/package/line-kit

 // example var count = 0 require('line-kit')(require('fs').createReadStream('/etc/issue'), (line) => { count++; }, () => {console.log(`seen ${count} lines`)})

I use below code the read lines after verify that its not a directory and its not included in the list of files need not to be check.在验证它不是目录并且它不包含在不需要检查的文件列表中之后,我使用下面的代码读取行。

(function () {
  var fs = require('fs');
  var glob = require('glob-fs')();
  var path = require('path');
  var result = 0;
  var exclude = ['LICENSE',
    path.join('e2e', 'util', 'db-ca', 'someother-file'),
    path.join('src', 'favicon.ico')];
  var files = [];
  files = glob.readdirSync('**');

  var allFiles = [];

  var patternString = [
    'trade',
    'order',
    'market',
    'securities'
  ];

  files.map((file) => {
    try {
      if (!fs.lstatSync(file).isDirectory() && exclude.indexOf(file) === -1) {
        fs.readFileSync(file).toString().split(/\r?\n/).forEach(function(line){
          patternString.map((pattern) => {
            if (line.indexOf(pattern) !== -1) {
              console.log(file + ' contain `' + pattern + '` in in line "' + line +'";');
              result = 1;
            }
          });
        });
      }
    } catch (e) {
      console.log('Error:', e.stack);
    }
  });
  process.exit(result);

})();

I have looked through all above answers, all of them use third-party library to solve it.我浏览了以上所有答案,都使用第三方库来解决它。 It's have a simple solution in Node's API.它在 Node 的 API 中有一个简单的解决方案。 eg例如

const fs= require('fs')

let stream = fs.createReadStream('<filename>', { autoClose: true })

stream.on('data', chunk => {
    let row = chunk.toString('ascii')
}))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM