简体   繁体   English

通过Node.Js生成管道Amazon S3下载流导致不完整的下载

[英]Piping Amazon S3 download stream through Node.Js spawn causing incomplete downloads

I'm currently trying to download an encrypted file from Amazon S3 and pipe it through GPG decryption which I'm spawning. 我目前正在尝试从Amazon S3下载加密文件并通过我正在生成的GPG解密来管道它。 I'm using the aws-sdk for node ( https://github.com/aws/aws-sdk-js ). 我正在使用aws-sdk作为节点( https://github.com/aws/aws-sdk-js )。

The download usually works, but on a slower network connection (which I'm testing by throttling my network), the file download will hang for a few seconds and the file will be truncated (sizes don't match up). 下载通常有效,但在较慢的网络连接上(我通过限制网络进行测试),文件下载将挂起几秒钟,文件将被截断(大小不匹配)。 The file is also now corrupt. 该文件现在也已损坏。 I believe the problem is coming somewhere in the spawn, as it might not be closing or finishing the response correctly due to the slower stream. 我相信问题出现在spawn的某个地方,因为它可能没有正确关闭或完成响应,因为较慢的流。

My code to download the file: 我的代码下载文件:

let params = {Bucket: Bucket, Key: Key};
let readStream = s3.getObject(params).createReadStream();
gpg.decrypt(readStream, res); // res is express response object`

My code to decrypt the file, and then pipe it to the response: 我的代码解密文件,然后将其传递给响应:

gpg.decrypt = function(inputStream, res) {
    let cp = require('child_process');
    let decryptionArgs = ['--decrypt', '--batch', '--yes', '--no-tty', '--passphrase', 'mypassphrase'];
    let gpg = cp.spawn('gpg', decryptionArgs);
    inputStream.on('error', (err) => res.status(500).json(err));
    gpg.on('error', (err) => res.status(500).json(err));
    inputStream.pipe(gpg.stdin);
    gpg.stdout.pipe(res);
}

I'm setting the Content-Type to application/octet-stream and the Content-Disposition to attachment; filename="thefilename" 我将Content-Type设置为application/octet-stream ,将Content-Disposition设置为attachment; filename="thefilename" attachment; filename="thefilename"

UPDATE: I figured out the issue, in case this helps someone. 更新:我想出了问题,万一这有助于某人。 On a slower network connection (tested by throttling my network), gpg.stdout would become unpiped. 在较慢的网络连接上(通过限制我的网络进行测试), gpg.stdout将变为unpiped。 I tested this by setting an event listener for the unpipe event. 我通过为unpipe事件设置事件监听器来测试它。 I was able to solve this issue by using buffers instead. 我能够通过使用缓冲区来解决这个问题。 I still pipe the input file stream to my gpg spawn, but instead of piping the output to the response, I'm writing to the response chunk by chunk: 我仍然将输入文件流传递给我的gpg spawn,但是我没有将输出传递给响应,而是通过chunk写入响应块:

gpg.decrypt = function(inputStream, res) {
    let cp = require('child_process');
    let decryptionArgs = ['--decrypt', '--batch', '--yes', '--no-tty', '--passphrase', 'mypassphrase'];
    let gpg = cp.spawn('gpg', decryptionArgs);
    inputStream.on('error', (err) => res.status(500).json(err));
    gpg.on('error', (err) => res.status(500).json(err));

    gpg.on('close', () => res.end());
    gpg.stdout.on('data', (chunk) => res.write(chunk));
    inputStream.pipe(gpg.stdin);
}

I am no JS expert, but have done quite a bit of low-level raw HTTP work, and here's the problem I see. 我不是JS专家,但已经完成了相当多的低级原始HTTP工作,这就是我看到的问题。

inputStream.on('error', (err) => res.status(500) ...

This seems like it would only be able to (correctly) handle errors that occurred very early in the process of fetching the file. 这似乎只能(正确)处理在获取文件的过程中很早发生的错误。

If an error occurred later, once the output has started, going to be too late, since the http response has already started, 200 OK has already been sent to the client. 如果以后发生错误,一旦输出开始,就太晚了,因为http响应已经开始, 200 OK已经发送到客户端。 Once you have started streaming a response... you see the dilemma, here. 一旦你开始流式传输响应......你就会看到这里的困境。

I see no obvious + simple way to simultaneously stream a response and handle a late error gracefully, and depending on how the environment handles this, the JSON might even be making its way into the tail of the response... changing a truncated file into a truncated file with noise at the end. 我没有看到任何明显的+简单方法同时流式传输响应并优雅地处理延迟错误,并且根据环境如何处理这个问题,JSON甚至可能会进入响应的尾部......将截断的文件更改为最后有噪音的截断文件。

But if you know the expected byte length of the output after decompression, it seems like you should be able to set the Content-Length response header. 但是如果你知道解压缩后输出的预期字节长度,你似乎应该能够设置Content-Length响应头。 When your transfer fails and the output is truncated, the user agent (browser, curl, etc.) should eventually realize that the content is too short and throw an error... there's no way to go back in time to do much else, and the only other option would be not to use streaming -- download, decrypt, verify, return... but that may not be viable for you either. 当您的传输失败并且输出被截断时,用户代理(浏览器,卷曲等)应该最终意识到内容太短并且抛出错误...没有办法及时回去做其他事情,并且唯一的另一种选择是不使用流式传输 - 下载,解密,验证,返回......但这对您来说可能也不可行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM