简体   繁体   中英

Node Read Streams - How can I limit the number of open files?

I'm running into AggregateError: EMFILE: too many open files while streaming multiple files.

Machine Details: MacOS Monterey, MacBook Pro (14-inch, 2021), Chip Apple M1 Pro, Memory 16GB, Node v16.13.0

I've tried increasing the limits with no luck. Ideally I would like to be able to set the limit of the number of files open at one time or resolve by closing files as soon as they have been used.

Code below. I've tried to remove the unrelated code and replace it with '//...'.

const MultiStream = require('multistream');
const fs = require('fs-extra'); // Also tried graceful-fs and the standard fs
const { fdir } = require("fdir");
// Also have a require for the bz2 and split2 functions but editing from phone right now

//...

let files = [];

//...

(async() => {

  const crawler = await new fdir()
  .filter((path, isDirectory) => path.endsWith(".bz2"))
  .withFullPaths()
  .crawl("Dir/Sub Dir")
  .withPromise();

  for(const file of crawler){
    files = [...files, fs.createReadStream(file)]
  }

  multi = await new MultiStream(files)
    // Unzip
    .pipe(bz2())
    // Create chunks from lines
    .pipe(split2())
    .on('data', function (obj) {
      // Code to filter data and extract what I need
      //...
    })
    .on("error", function(error) {
      // Handling parsing errors
      //...
    })
    .on('end', function(error) {
      // Output results
      //...
    })

})();

Per the multi-stream doc , you can lazily create the readStreams by changing this:

  for(const file of crawler){
    files = [...files, fs.createReadStream(file)]
  }

to this:

  let files = crawler.map((f) => {
      return function() {
          return fs.createReadStream(f);
      }
  });

After reading over the npm page for multistream I think I have found something that will help. I have also edited where you are adding the stream to the files array as I don't see a need to instantiate a new array and spread existing elements like you are doing.

To lazily create the streams, wrap them in a function:

 var streams = [ fs.createReadStream(__dirname + '/numbers/1.txt'), function () { // will be executed when the stream is active return fs.createReadStream(__dirname + '/numbers/2.txt') }, function () { // same return fs.createReadStream(__dirname + '/numbers/3.txt') } ] new MultiStream(streams).pipe(process.stdout) // => 123 ```

With that we can update your logic to include this functionality by simply wrapping the readStreams in functions, this way the streams will not be created until they are needed. This will prevent you from having too many open at once. We can do this by simply updating your file loop:

for(const file of crawler){
    files.push(function() {
        return fs.createReadStream(file)
    })
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM