简体   繁体   English

节点,缓冲区,流式传输,保存到文件

[英]Node, buffer, to stream, to saving to a file

I'm trying to take a file (returned as a Buffer from S3), untar it (which is a stream), and then save it to disk at /tmp/foo . 我正在尝试获取一个文件(从S3作为Buffer返回),将其解压缩(这是一个流),然后将其保存到/tmp/foo磁盘上。

Does it even matter if I handle the untar ( zlib.gunzip() ) function asynchronously if this script will only ever handle one file at a time. 如果此脚本一次只能处理一个文件,即使我异步处理untar( zlib.gunzip() )函数也没有关系。 What do I have to gain by using streams? 使用流有什么好处?

var getS3Args = { bucket: 'foo', key: 'bar.tar.gz' }

lib.getS3Object(getS3Args, function(getS3ObjectResponse) {
  zlib.gunzip(getS3ObjectResponse.Body, function(err, result) {
    if(err) return console.error(err);

    // ?

    console.log(result);
    return callback(result);
  });
});

You can get a stream of the data in S3 directly from the aws-sdk. 您可以直接从aws-sdk获取S3中的数据流。 The advantage you get when using streams is that it uses much less memory since it doesn't need to have the entire buffer in memory to operate on it. 使用流时,您获得的好处是它使用更少的内存,因为它不需要整个缓冲区都在内存中进行操作。 Streams operate on small chunks at a time and then those chunks get garbage collected after they've been processed. 流一次只在小块上运行,然后在处理这些块后将其收集为垃圾。 Using your current method, if you wanted to download a 1TB blob from S3 you would likely get an out-of-memory error because there's no way you would be able to fit the entire buffer in memory. 使用当前方法,如果您想从S3下载1TB Blob,则可能会出现内存不足错误,因为无法将整个缓冲区装入内存。 When using streams you would probably never see more than a few extra MB of memory being used because a chunk will come down from the HTTP response, that chunk would then get unzipped, untarred, and written to your file system by itself without having to wait for the entire HTTP response. 当使用流时,您可能永远不会看到正在使用的额外MB内存,因为一个块将从HTTP响应中消失,然后该块将被解压缩,解压缩并自己写入文件系统,而无需等待整个HTTP响应。

var AWS = require('aws-sdk')
var S3 = new AWS.S3()
var fs = require('fs')
var tar = require('tar')
var zlib = require('zlib')
var path = require('path')
var mkdirp = require('mkdirp')
var getS3Args = { bucket: 'foo', key: 'bar.tar.gz' }
var dest = '/path/to/destination'

S3.getObject(getS3Args)
  .createReadStream()
  .pipe(zlib.Unzip())
  .pipe(tar.Parse())
  .on('entry', function(entry) {
    var isDir     = 'Directory' === entry.type
    var fullpath  = path.join(dest, entry.path)
    var directory = isDir ? fullpath : path.dirname(fullpath)
    mkdirp(directory, function(err) {
      if (err) throw err
      if (!isDir) entry.pipe(fs.createWriteStream(fullpath))
    })
  })

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM