Node, buffer, to stream, to saving to a file

Question

I'm trying to take a file (returned as a Buffer from S3), untar it (which is a stream), and then save it to disk at /tmp/foo .

Does it even matter if I handle the untar ( zlib.gunzip() ) function asynchronously if this script will only ever handle one file at a time. What do I have to gain by using streams?

var getS3Args = { bucket: 'foo', key: 'bar.tar.gz' }

lib.getS3Object(getS3Args, function(getS3ObjectResponse) {
  zlib.gunzip(getS3ObjectResponse.Body, function(err, result) {
    if(err) return console.error(err);

    // ?

    console.log(result);
    return callback(result);
  });
});

Answer 1

You can get a stream of the data in S3 directly from the aws-sdk. The advantage you get when using streams is that it uses much less memory since it doesn't need to have the entire buffer in memory to operate on it. Streams operate on small chunks at a time and then those chunks get garbage collected after they've been processed. Using your current method, if you wanted to download a 1TB blob from S3 you would likely get an out-of-memory error because there's no way you would be able to fit the entire buffer in memory. When using streams you would probably never see more than a few extra MB of memory being used because a chunk will come down from the HTTP response, that chunk would then get unzipped, untarred, and written to your file system by itself without having to wait for the entire HTTP response.

var AWS = require('aws-sdk')
var S3 = new AWS.S3()
var fs = require('fs')
var tar = require('tar')
var zlib = require('zlib')
var path = require('path')
var mkdirp = require('mkdirp')
var getS3Args = { bucket: 'foo', key: 'bar.tar.gz' }
var dest = '/path/to/destination'

S3.getObject(getS3Args)
  .createReadStream()
  .pipe(zlib.Unzip())
  .pipe(tar.Parse())
  .on('entry', function(entry) {
    var isDir     = 'Directory' === entry.type
    var fullpath  = path.join(dest, entry.path)
    var directory = isDir ? fullpath : path.dirname(fullpath)
    mkdirp(directory, function(err) {
      if (err) throw err
      if (!isDir) entry.pipe(fs.createWriteStream(fullpath))
    })
  })

Node, buffer, to stream, to saving to a file

Question

1 answers

solution1
3 2016-09-11 15:32:37

Node, buffer, to stream, to saving to a file

Question

1 answers

solution1 3 2016-09-11 15:32:37

solution1
3 2016-09-11 15:32:37