简体   繁体   English

Node.js流和数据消失

[英]Node.js streams and data disappearing

I've been playing with Readable and Transforming streams, and I can't solve a mystery of disappearing lines. 我一直在使用Readable和Transforming流,但无法解决线条消失的谜团。

Consider a text file in which the lines contain sequential numbers, from 1 to 20000: 考虑一个文本文件,其中的行包含从1到20000的连续数字:

$ seq 1 20000 > file.txt

I create a Readable stream and a LineStream (from a library called byline : npm install byline ; I'm using version 4.1.1): 我创建了一个Readable流和LineStream (称为署名从库: npm install byline ;我使用的是版本4.1.1):

var file = (require('fs')).createReadStream('file.txt');
var lines = new (require('byline').LineStream)();

Consider the following code: 考虑以下代码:

setTimeout(function() {
  lines.on('readable', function() {
    var line;
    while (null !== (line = lines.read())) {
      console.log(line);
    } 
  });
}, 1500);

setTimeout(function() {
  file.on('readable', function() {
    var chunk;
    while (null !== (chunk = file.read())) {
      lines.write(chunk);
    }
  }); 
}, 1000);

Notice that it first attaches a listener to the 'readable' event of the file Readable stream, which writes to the lines stream, and only half a second later it attaches a listener to the 'readable' event of the lines stream, which simply prints lines to the console. 请注意,它首先将侦听器附加到file Readable流的'readable'事件,该事件将写入lines流,仅半秒钟后,它将侦听器附加到该lines流的'readable'事件,该事件仅打印线到控制台。

If I run this code, it will only print 16384 (which is 2^14) lines and stop. 如果运行此代码,它将仅打印16384(2 ^ 14)行并停止。 It won't finish the file. 它不会完成文件。 However, if I change the 1500ms timeout to 500ms -- effectively swapping the order in which the listeners are attached, it will happily print the whole file. 但是,如果我将1500ms超时更改为500ms -有效地交换了侦听器的连接顺序,它将很高兴地打印出整个文件。

I've tried playing with highWaterMark, with specifying an amount of bytes to read from the file stream, attaching listeners to other events of the lines stream, all in vain. 我尝试过使用highWaterMark,指定了要从文件流中读取的字节数,并将侦听器附加到行流的其他事件,但都是徒劳的。

What can explain this behavior? 有什么可以解释这种行为?

Thanks! 谢谢!

I think this behaviour can be explained with two things: 我认为这种行为可以用两件事来解释:

  1. How you use streams. 您如何使用流。
  2. How byline works. byline工作原理。

What you do is manual piping. 您要做的是手动管道。 The problem with it is that it doesn't respect highWaterMark and forces the whole to be buffered. 它的问题是它不尊重highWaterMark并强制整个缓冲区。

All this causes byline to behave badly. 所有这些都会导致byline表现不良。 See this: https://github.com/jahewson/node-byline/blob/master/lib/byline.js#L110-L112 . 看到这个: https : //github.com/jahewson/node-byline/blob/master/lib/byline.js#L110-L112 It means that it stops pushing lines, when buffers length > highWaterMark. 这意味着当缓冲区长度> highWaterMark时,它将停止推线。 But this doesn't make any sense! 但这没有任何意义! It doesn't prevent memory usage growth (lines are still stored in special line buffer), but stream doesn't know about these lines and if it ends in overflown state, they will be lost forever. 它不会阻止内存使用量的增长(行仍存储在特殊的行缓冲区中),但是流不知道这些行,如果流以溢出状态结束,它们将永远丢失。

What you can do: 你可以做什么:

  1. Use pipe 使用pipe
  2. Modify highWaterMark : lines._readableState.highWaterMark = Infinity; 修改highWaterMarklines._readableState.highWaterMark = Infinity;
  3. Stop using byline 停止使用byline

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM