简体   繁体   中英

Node.js streams and data disappearing

I've been playing with Readable and Transforming streams, and I can't solve a mystery of disappearing lines.

Consider a text file in which the lines contain sequential numbers, from 1 to 20000:

$ seq 1 20000 > file.txt

I create a Readable stream and a LineStream (from a library called byline : npm install byline ; I'm using version 4.1.1):

var file = (require('fs')).createReadStream('file.txt');
var lines = new (require('byline').LineStream)();

Consider the following code:

setTimeout(function() {
  lines.on('readable', function() {
    var line;
    while (null !== (line = lines.read())) {
      console.log(line);
    } 
  });
}, 1500);

setTimeout(function() {
  file.on('readable', function() {
    var chunk;
    while (null !== (chunk = file.read())) {
      lines.write(chunk);
    }
  }); 
}, 1000);

Notice that it first attaches a listener to the 'readable' event of the file Readable stream, which writes to the lines stream, and only half a second later it attaches a listener to the 'readable' event of the lines stream, which simply prints lines to the console.

If I run this code, it will only print 16384 (which is 2^14) lines and stop. It won't finish the file. However, if I change the 1500ms timeout to 500ms -- effectively swapping the order in which the listeners are attached, it will happily print the whole file.

I've tried playing with highWaterMark, with specifying an amount of bytes to read from the file stream, attaching listeners to other events of the lines stream, all in vain.

What can explain this behavior?

Thanks!

I think this behaviour can be explained with two things:

  1. How you use streams.
  2. How byline works.

What you do is manual piping. The problem with it is that it doesn't respect highWaterMark and forces the whole to be buffered.

All this causes byline to behave badly. See this: https://github.com/jahewson/node-byline/blob/master/lib/byline.js#L110-L112 . It means that it stops pushing lines, when buffers length > highWaterMark. But this doesn't make any sense! It doesn't prevent memory usage growth (lines are still stored in special line buffer), but stream doesn't know about these lines and if it ends in overflown state, they will be lost forever.

What you can do:

  1. Use pipe
  2. Modify highWaterMark : lines._readableState.highWaterMark = Infinity;
  3. Stop using byline

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM